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CEMERATIOK W4D SELECTION OP NOVKL BrKDTNG PROTEINS 
DACKCKOUWD OF THE INVENTION 

r iold- of Tnvontio.n 

This invention r^iiatcs to dcvc Lcp^nenC 
binding proteins by an iterative process o f 
nutagenesis, expression, chromatographic neLection, and 
anpl i f icaticn. 

TnfQrnation DiscI og'Jro -Statement 

.The anino acid sequence o:* a protein deter-inos 
its three-dinensional (3D) structure, which in turn 
d-^terair.es. protein runctioning (ITCT^Si. a:.t:73J. A 
vidci/ accepted systeri of cLassifying protein structure 
.-y be round in Sc^.ulz and Scr.ir-er (SCHUT9. Ch5i . 
Ir.eiL- classification system is a-io?ced heroin. 

Shortle (fHORaS), Sauer and colleagues C?AK'J56. 
REID83), and Carutners and colleagues (EISESb) have 
shown' that sor.e residuc:s on the polypeptide chain ar- 
.core ir.portar.t th.^r. others in doter-ming the 3D 
structure of a protein. The 3D structure is 
essentially unaffected by the identity of the a.-ir.o 
acids at sore loci; at other loci cnly one or a rev 
Lypes of ar-inc acid is alloued. In nost cases. loci 
--.here wide variety is allowed have the amino acid side 
group directed toward the solvent. Loci where United 
variety is allowed frequently have the siJe group 
directed toward other parts of the protein. Thus 
substitutions of a-i.no acids that arc oxoasod t-^ 
solvent are less likely to affect the 3D structure t.han 
are substitutions at internal loci. (Cec also SCHU79, 
P150-171 and C.=;EIS4, p239-:A5, 3L;-.J15). 
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The secondary structure (helices, sheets, turns, 
loops) of a protein is detennined mostly by local 
sequence. Certain aaino acids tend to fce correlated 
with certain secondary structures and the ccr^r.only used 
Chcu-Fasman (CK0U7<;, CH0L-7aa, CKCUTSb) rules depend on 
these correlations. The correlations bof-een anino- 
acid type and secondary structure are not, however, 
absolute, and every ar:ino acid type has been observed 
in helices and in both parallel and antiparallel 
sheets. Kabsch and Sander (KABSB4) report on 
pentapeptides of identical sequence found in different 
proteins; in some cases the conforr.ations of the 
pentapeptides are very different. Argos (ARGC37) 
surveyed pentapeptides of simiiar sequence ir. different 
proteins and found chat the structures of the scquence- 
sir.ilar subseq-^ences vere frequently different. 

The r-iiidues that join helices to helir.es, helices 
to sheets, and sheets to sheets arc c^ilod turns and 
loops and have rocer.tly been classified by Richatdscn 
(P.IC1131), Thcrr.ton (THCRSS) , Sutcliffe (SL'TCSTi) 
and others. Insertions and deletions are .7.ore reav-iily 
tolerated in Iccps th-m clsevhcre. Thornron et a.K 
■(TK0R3o) have sur.aarized rr.any cbscrvat ior.s indicating 
that related proteins usually differ riost at the Iccps 
which join the r.cre ro^jul^r elements of secondary 
structure. 

When Che cimiro acid sequence of cne protein has 
been changed to be r.ore like the sequence of a second 
protein, the properties of the novel protein usually 
approach the properties of the second protein. Wells 
et ali (v;ELLS7a) reported that changing three re-.iducs 
in subtil is in from l^-ic yl - u s arvlol I cuf^focicns to be the 
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soro as t>.o corresponding residues in subtUisin fro.-a 
1 irhcniror-U produced a protease that had nearly 
the sa.T.e activity as the subcilisin troa the latter 
orqanisn. Thc-e were 82 diffcrenjes renaininy in the 
5 sequences. The three residues chanqed were chosen 
because they vero the only ditferenc6s within 7 
Angstrcr.s {Ai of the active site. 

Many proteins bind non-cova lent ly but very tightly 
10 and specifically to somft other characteristic 
aoLecules. Schul/. and Schirner sunnariic many 
observations on the binding of proteins to other 
proteins (SC:{'J70, p93-l05) . For example, haerxqlcbin 
alpha chains bind very tightly to haemoglobin beta 
15 chains (delta C less than -U.O Kcal/nolo) ; antibodies 
bind tightly to antigens (K^s range fron 10"^ to 10-'- 
M. is the dissociation constant equal to 

[ A 1 r 3 i / ( A: 5 j i ; basic bcvine pancreatic trypsin 
inhibitor (Brri). binds tightly to trypsin ( - 6.0 >; 
20 10-'^-^ M (TSC:;37), delta C = -15.0 Kcal/mole); an^ 
avidin binds to biotin (Kd = 1-3 x lO'^^ M (CRIISC, 
p362} ) . 

In each cai^e the binuing results fror. 

25 cc-pio.-nentarity cf the surfaces th.it corne into con'cact: 
bur.ps fit into holes, unlike charges cone together, 
dipoles align, and hydrophobic atons contact other 
hydrophobic atons. Although bulk water is excluded, 
individu.il vater molecules are frequently found filling 

30 space in i nt^rno lecular interf^.ces; these varcrs 
usually fcr= hydrogen bonds to one or more ator.s of the 
protei.T or to other bound water. ' Thus proteins found 
in nature have not attained, nor do they require 
perfect ocnipleaentar ity to bind tightly and 

35 specifically to their substrates. Only in rare cases 



B^ST AVAIUBLE COPY 



\ 



■■■J 



is tnerc esr^ent ia 1 ly perfect co-.p I e.-:ionta r ity ; then the 
birdinq is exttrc-.ciy tight (as tor exa-ple, avidlr. 
binding to biolin) . \_ 

I.. 

5 The relative i-portar.ce of c Lectros ta t ic vs. 

hydrophobic interactions is not . cully unJcrr.toc:! \. 
(GCliUTV/ plOSJ . Attraction tet-ecn oppositely charcoJ V \ 

groups apparently contributes little to the f rte-enerzy 

of binding between proteins and other molecules. LiV.e- ^ 
10 charged groups car.. however, increase specificity: 
repulsion of like-charged groups in the bind-.-^ 
interf.-.ce or even unpaired charges in the interface can 
greatly reduce or elir.inate binding in" instances whnre 
shape and hydrophobic interactions would other- ise j^.";-^ 
15 induce it. 
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It has been cbser/ed, hcvccr, that proteins car. 

tind to other noioculcf such thac ' 1 i:<c-c:-aVgGd arc-:-:; 

ai-c }u>:tarcsed; in ::uc.t instances repulsion is rec--:::-rd 

20 or eli-inited by inclusion cf oppositely charged icns 

in the binding interface . An exii-ple of this 

phenor.enon is the ir.ciusion c: tvo positively charced 

calciu.n ions between each pair of subunits cf turn.c j; > 

r • ^ 

' crinkle virus (HOGL33). The subunits each contain t-o ^ . 

2d negatively charged U ( s in-^ Ic- 1 otter dntno acid c-ces [; • 

are given in Table i) and E residues in close f 
proximity. 

The factors affectinq prot^rin binding are knc-n. t. ^ 

30 (CitOT75. CH0T76, SCH'J79, p'J8-lC7, andCREIS^, Ch9), but [ 

designing new cor.p 1 or.ent a ry surfaces has prove- ^. : 

difficult. Although so.7.e rules have been developed for |7 .i 

substituting side groups (5'j:C37b) , the side groups cf V ■ •;. 

proteins are floppy and it is difficult to predict what y 
35 confor-ation a new side group will take. further, the 

r " 

r 



forces bind proteins to other nolecJlcs arc all 

relatively weak and it is difficult to predict the 
effects of these forces. 

5 Recently, Quiocho and collaborators (QUI0a7) 

elucidated the structures of several por ip las.-nic R 

binding proteins fron Cro--neg.-j tivo bacteria. They ^ 

found tnat the proteins, de:;?ite having low sequence y 

hox.ology and differences in structural detail, have ^, 

10 certain important similarities. Each of the proteins tr->^ 
they investigated is coir.posed of two do.T.ains that ^^e 

joined by three strands of protein. The binding site [;.. 

is located between the two donains and is isolated frc.-n j. 

bulJc solvent. The structure of the bi.iding site is ^ 

15 dense and highly ordered, and binding constants are p - 

very high. The researchers suggest that binding of |. ^ 

ligands causes a ccn forma ticnai change that alters the | 

rclative oositions of the two donaina. ; 

20 The researchers found that each of the pcriplasn:ic ^. 

binding proteins has nui:.ercus residues (seven or .mere) , h- 
arrayed about the binding site. Surprisingly, icnic 
ligands are not bound by ionic side groups or opposite { 
charge, but by nain-chain corrponents . Electrical ^. - j 

25 'charge seens to be neutralized by dipole interactions. 
Further, hydrophobic contacts play an important role in 
binding . 

Based on their investigations of these binding 
30 proteins, Quiocho et a i . suggest it is unlikely thac, 
using current protein engineering ncthods, protciins can 
be constructed with binding properties superior to 
those of proteins that occur naturally. 
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Wilkinson ct a I . [yiWy.Z^) have fcund, hovcvcr, 
that enzyr.e-substrace affinity r.ay be increased by 
protein engineering. They reported that a mutant of 
tyrosyl tRT^A synthetase of B/icl Uus stearct^.orncnhi lus 
that has proline at residua 51 exhibits a 100-fold 
increase in affinity for ATI'. \ 



Substitution of one amino acid for another at a 
surface locus may profoundly alter binding properties 
of the protein other than substrate binding, without 
affecting the tertiary structure of the protein. For (: 
example, in sickle-cell hae.T.oglobin the change of the 
surface residue £6 to V in the beta chains causes 
deoxyhacmoglobin-S to forn fibers through self binding 
(DICK3J. pi:5-145). Love a.nd others have shcvn that ■ 
the tertiary and quaternary structure of the 
haemoglobin are not changed (PADLc5, wrsH75, WISH76) . 

Tan and Kaiser (TANKT?) and Tschesche ot a.U 3 
(TSCHa?) showed that changing a single azino acid in 

BPTI greatly reduces its binding to tr:,-psin, but that r ^ 

some of the new molecules retain the parental \ 
characteristics of binding to and inhibiting 
chymotrypsin, while others exhibit nc-* binding to X 
elastase. Caruthcrs and others (ErS£25; hv/e snown 
that changes of single anino acids on the surface of ^ 
the lambda Cro repressor greatly reduce itf; affinity 
for the natural operator Or3, but greatly increase the 
binding of the mutant protein to a mutant operator. % 
Thus changing the surface of a bincing protein nay \\ 
alter its specificity without abolishing binding V 
activity. 



The recently developed techniques of "reverse \ 
genetics" have been used to produce single specific f 



10 



nutationr, at precir,e base pair loci (CLIPB6, OLIPS?. 
ar.d MuLdtions are .^cnorJlly dececced ty 

sequencing and in' sono cat^os by loss of wild-cype 
function. These procodures ailov reseai-chers to 
analyze the function of each rosidu-. in a protein 
(MCLL83) or of each base pair m a roqulatory DUA 
sequence (CMEIUoa). Fn those analypc-.s, the norm ha^ 
been to' strive for the classical qo^l of obtaining 
mutants carrying a single alteration (AV3*J37). 

Reverse genetics is frcTic^'^iy ^icplied to coding 
regions to deternine which residues arc -ost icportant 

to the protein Etri:c-'-.e - - • 

studies, isolation of a sin^ie -u'lart at each re:>idue 
15 of the protein gives an initial estimate of vhLch 
residues pLay crucial roles. 

^ 

Prior to the method of the present invention, tvo 
general approaches have beer, dovelopod to create ncvcl 
r.utant proteins through reverse genetics. Both r,ct-cc:3 
start with a clone of the ger.e o: ir.terest. In one 
approach, dubbed "protein surqery" (reviewed by Dill. 
(DILLSTj), a specific subst it*-t ion is intrcducod at a 
single protein residue by a synthetic .-.ethod uriing rhe 
corresponding natural or synthetic clcn-cd gone. Cra;/. 
'et.aj^ (CR.\I35), Roa et j_L. (.^'-^Ol^B?; , ar.d Bash ot 
(BASIi37) have used thi^ approach to dotornine the 
effects on structure end function of sp-^cirvr 
substitutions in trypsin. 

30 

The other approach h^s bi?cn to generate a v.-.riety 
of .riutants at many loci -.-ithin the cicned gene, tro 
"qcne-directed randoa r.u t acenes is" r.othod. The 
specific location and nature ot the change arr- 
35 Uetcrmincd by DNA sequcncinc. It nay be possible to 
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screen for mutations if loss of a uiLd-type function 
confers a cellular phenotype. Using 
imtnunoprecipita t ion, one can th»;n differentiate ar.ong 
mutant proteins t.^at: a) fold out fail to funcr ion, b) 
fail to fold but persist, and c) are degraded, perhaps 
due to failure to fold. This approach is cxcnplified 
by the work of Pakula fjt iL. (PAKU8 6) cn the etfect of 
point mutations on the structure and function of the 
Cro protein frorr. bacteriophage lambda. This approach 
is limited by the number of colonies that can bo 
examined. An additional important limitation is th.at 
many desirabi . rotein alterations require nultiple 
anino acid substitutions and thus are not accessible 
through single base changes or even through all 
possible amino acid subst itut iop.s at any one residue. 

The objective in both the surgical and ger.e- 
dirccted random mutagenesis approaches has been, 
however, to analyze the effects of a variety of single 
substitution mutations, so that rules governing ?uch 
subst it':t: .Tns could be developed (ULXcSJ). rTocrciis 
has been greatly hampered by the extensive efforts- 
involved in using either method and the practical 
linitacions on the number of colcnies that can be 
inspected (RODE36) . 

The term •'saturation mutagenesis" vith reference 
to synthetic DriA is gi^nerally taken to -ean generation 
of a population in which: a) every possible single-base 
change within a fragr:ent of a gene of DMA regulatory 
region is represented, and b; most mutant genes cont.iin 
only one mutation. Thus a sot of all possible single 
mutations for a 6 base p.=^ir length of D::a comprises a 
population of 10 mutants. Oliphunt ^t aX^ (OL1P36) and 
Oliphant and Struhl (OLIP37) have demonstrated ligation 
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and cloning of highly degenerate ol igon^jcicot '.des and 
h-avt* applied saturation mutagenesis to the study of 
proriotcr sequence and function. Tncy have suggectcd 
that similar methods could be used to study genetic 
expression of protein coding regions of genes, but they 
do not say hou one should: a) choose protein residues 
to vary/ or b} select or screen nutants wich dociracle 
prope rt ies . 

Reidhaar-Olscn and^ Sauer {f(Er038) have used 
synthetic degenerate oligo-nts to vary s i-u 1 taneously 
tvo or three residues through all f-enty an:no acids in 
the direr interface of cI repressor fron bacteriophage 
larr.bda. They give no discussion of the Units on how 
nany residues could be varied at once ncr -Jo they 
mention the problem of unequal abundance of 0U^ 
encoding different anino acids. They looked tor 
proteins that either had vild-type d iner i;at :on or that 
did not dimerize. They did not seek proteins having 
novel binding properties dnd did not find any. 

Several researchers have d*?signed and -.ynthoslzed 
proteins do novo . Those designed protoir.i; are sr.iLl 
and nost have been synthcsiz-d in vitro as po lypepci-'ics 
rather than genetically. Outte and colleagues have 
-made a polypeptide that bir.ds COT in ethanol 
{MOSE33) . Recently Moser ct aj^ ('^OSc.S?) reported 
genetic expression in ca.U both cf the dcr^igned 2^ 
residue DDT-binding protein and of fusions cf the DDT- 
binding sequence to LacZ . They state that design of 
biologically active proteins is currc-ntly iTip.iss ible . 

Erickson 'j_t aj^ (EP.rcSG) have designed and 
synthesized a series of proteins that they have named 
betribellins, that arc meant to have beta sheets. They 
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suqycs^ use OC polypeptide tiyncLesis with nixrcd 
reaqents to produce ccveral hundred analogous 
bctabellins. They suggest the nixture te paiir.ed over' a 
colu.-nn to recover the analogues with high affinity for 
a chosen target compound tound to the column. They 
envision successive rounds of mixed synthesis of 
variant proteins and purification by specific binding. 
They do not discuss hew rosiduei; iihcuLd be cnosen for 
variation. because proteins can net *ce anplifiod, the 
researchers nust sequence the recovered protein to 
learn which substitutions improve binding. -The 
researchers must limit the level of diversity so that 
each variety of protein- will be present in sufficient 
quantity for the isolated fraction to be sequenced. 



A number of methods have been developed to 
separate cells through their affinity to various 
substances. Bonnafous .e_t ^Li. (3Ct:r;3 5) review cethods 
that have been applied to anirr.al ceils, and cite two 

20 ccr-.Tcn problems: a) non-specific interact ions between 
cells and affinity supports, and b) irreversible 
binding, of" cells to affinity matrices. Possible 
reasons for irreversible binding include .r.jitipie 
points of attachment and very hiqh affinity between 

25 cells and antibodies ur.cd as affinity rr.aterials. 
'chronatograph ic separation of aninal cells is still 
difficult because of their fragility. Eacterial cells, 
bacterial spores, and sone tactnr icphaqe , however, arc 
sturdier than aninal cells ^nd have been fractionated 

30 based on proteins displayed on their surfaces. 
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Ferenci and collaborators have published a series 
of papers on the chromatographic isolation of mutants 
of the maltose-transport protein LamB of €^ coil 
(WA»n?9, FKKESOa, FERFSOb, FEREBOc. rE.".E32a. FERES2b, 
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FERcS-J, CL'J;M34, FE?,Ea6a, FLP-ra^^o, F.'P.F.SSc, FF.nSS7ft, 
FERt:07b, HEIN87, And HS.CNC 3 ) . The papers report that 
spontaneous anrJ induced mutants at the J^Q genetic 
locus can be isolated by chrona toq raohy over a colunn 
supporting immobilized maltose, ra 1 todextr ins , or 
[ carbohydrates that could cc metabolized by 
the bacteria. The reports speculate that other 
applications are possible. but ::pec: i f ica 1 1 y mention 
only the elucidation of the rosid':cs responsible tor 
the selectivity of the moicodeztrin pore cr similar 
pore proteins. 

Ferenci's experiments measured the combination of 
the individual affinity of mutant L^m3 molccul*^s and 
the level of expression. Several clasr>es of mutants in 
lamS were isolated. Ono cl.vr.s had higher affinities 
for both maltose and starch, one class had lover 
affinity for starch but hiqh*;r affinity for maltose, 
and another class had higher affinity for starch but 
lower affinity for maltose. 

Mutants vere generated eirhor by hydroxy! amino 
treatment 'of a plasmid carrying the entire gent-, or by 
insertions of tvo extra codons at natural Hs^, II sites. 
Levels of mutagenesis were pic>:ed to provide r-ingli^ 
point mutations or single insertions of tvo r'js;.d':es. 
No multiple mutations were sought or tnunu. 

LamB is a Urge trimeric integral mer.brane 
protein: such proteins are very difficult to 
crystallize or even to solubilizo. Therefore it is 
difficult to use s i ng 1 e -c ry s t a L protein X-ray 
crystallography or NMR to obtain detailed :d structural 
information. Caravito ct aj^ have obtained 

crystals of LamB that diffract X-rays, but the 3D 
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ctructu:-*: of r.!ie protein l;as noc '/•-••: 
T^.cre Arc r:ir..-i«lc {CEHRSV, llEIMSn) 
secondary structure of Lar.B, 1 - o • tr 
residues arc in bota-chcct conforno t icn, vnich rc-siducs 
are in tornj, and wnich rosidues are cn the outside, in 
the pcriplacm, or in the rr.onbrane . These r.odcls do not 
specify how the bcto shoots are nrr^inqed nor /Jhich 
turns are close to which ochcr turnn. ^ 



1^0 Fcrcnci ind I-oe (Kiri£ar,a) rcporccd on the 

tcnperature sensitivity of carbohyd r^- to bindini in IK 
r.*:c>-iroth erno nh ilus . At higher to-peracures , the 

orqanism breaks down the pol ysaccha r i'Ja , the binding of 
which was the object of the study. Clunc, Lee. and 

IS Fercnci (CUJIiyc) reported that presence of conplcte O- 
antiqen affected the binding properties of La.-a on the 
surface of coll . Both of these reports point up the 
difficulties of working with Live bacteria that can 
r.ctabclize chenicais and change troir physiological 

20 behavior during the chromatographic exp^r inent . Heine 
e_c a I . (HF.i::C3) have used the chr:-o':axi3 of ^L-_Lk 
recently to isolate rr.utant:r in Ur.H thot arc unaffected 
in chenotaxis; thi? .'\ppro.'tch is li-itod to netabolitcs 
that affect cncrr.ota xis . 

25 

Make la r^t ai^ (:>U\KrS0) roviovo'l -ntt:-jas that 
involve chefr.icolly coupling antigens to boct r icphaga 
to produce a sensitive, quant i tut ivo .-tetec t io:. systen 
for antibodies. The r.othods revic-od exploit the 

30 ability to ar.pl ify the signal prcti-jccd by onticodies 
binding to the nntigcns coupled to tr.e phage, through 
growth of the phage. The antigens -crs jcino-J to the 
phage chcnically and not encoded ir. tr.e g«nc:; of the 
phage. Thus there wos no sorting ct -genetic -aterial. 

3 5 Furthernioro, the objectives of the- retho.-Js reviewed 
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involve titcrinq the phage that fail to bind, as an 
assay of antitody. The oiethods of the present 
invention, in aost cases, involve yrouth and 
amplification of genotK: packages that bind .ui-.h hign 
5 af:inicy. 

In 1935 Smith (5:UT85) reported inserting a 
heterologous qene into gene m of bacteriophage fl. 
The g-'ne III protein is a riinor coac protein necessary 

10 for infectivity. ' In sore cases the inserted gene 
preserved the original reading Crame, leading to 
expression of hetero Logcur. protein as an inserted 
domain in the gene III protein. S.-ith den>cr.straced 
than* the resulting strain of fl vi.icn are adsorbed by 

15 antibody against the protein encoded by the 
heterologous d::a. The antibody vas bound to a 
polystyrene petri dish. The phage vere eluded at ?Ii 
2.2 and retained sc:r.e inf octivity . However, tne single 
copy of f-l gene US was used for insertion c£ the 

20 heterologous gene so that all copies of gene III 
protein were affected; infecti%'ity of the resultant 
phage vas reduced 25-Cold. Smith also de-.onst ra'.cd 
- thit batch elution fro:?, a plate can separate fl virions 
that differ by only a few protein dor.ains on their 

25 surfaces. . 



Smith presented his r.ethod as a vay to isolate 
cloned genes using antibodies to the gene products. He 
r.ade no nenrion of mutagcnizing the inserted genetic 
30 material or of inducing novol binding properties in the 
inserted protein dcnain. 

De la Cruz ot a_L. (CRUZ88) have expressed a 
frag.-nent of the repeat region of the c i rcutr.sporozo ite 
35 protein fron PlAsm odiun f.^\rAp.^rur^ on the surface of 
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Hn as an insert in the gone III procain. They showed 
that the reccrjjinanc phage were both antigenic and 
ir.nunogenic in rabbits, and that such recoriinant phage 
could be used for 8 epitope napping . The rese^rchefs 
suggest that sir.ilar recor.binant ph.»ge could be used 
for T epitope r.apping and for vaccine dcvelcpr.ent. 
They do not suggest mutagenesis of the ir.serted 
r.aterial . 

Gene fragr.ents coding :or portions ot hepatitis B 
virus antigens have been fused to fraqrr^ents of iariB. 
If the point of fusion is in a region coding fcr 
exposed domains of LamB, the HDV antigens appear cn the 
•cell surface and are inr.unogen ic (CIURS7). Charbit 5^ 
ai. faiArl37) suggest use of those engineered strains 
for ccveiopr.ent of a live bacterial v.^ccine; they have 
not reported interest in r.-^tagor.es i s of r^.e i'^'J^d 
heterologous gene f ragnioi^ts , nor in cavolc?-ent of 
binding capi\b i ■ i t ies . 

Recently Tjian and colleagues (KO:}A35. b:^:Gc7, and 
J0:;E37) have sr-.2-.-n that DNA of definite sequence bound 
tJ an affinity c-lc.-n can be u;ed to purify prcceins 
that bind the CNW r;equcnce-3?ec i f ica I ly - The proteins 
are purified as nuch as lOCO-fold in tuo 
cnrcT-atographic i^tops or S3-fold in a single s^.c?. 

Patents and patent applications vhich r.:sy be of 
interest inclu.-.c US Patent No. 4.70;. 692. "Ccnoutcr 
Based ;Jystcn and Method fcr Ooterr.ining and Displaying 
Possible Chonicai Structures for Converting Co-cle- or 
Nult iplc-Chain Polypeptides to Single-Chain 
Polypoptidos'* .(Ladner '692), issued to Robert Charles 
Ladncr on 3 ::over.ber 1937 and assigned to Cenex 
Corcoration. :..iancr '692 describes a design nethod for 
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convercinq proteins composed cf ::wo or nore chains into 
proteins of fewer polypeptide chains., tiut with 
essentially the sane 3D structure. - There is no 

mention of variegated DNA and no qonetic selection. 

Robert Charles Ladnor has. six pater.t 

applications pending before the USr-TO and ass::;r.=d to 
Cenex Corporation: 

07/92, 110 
07/21,046 
07/21,047 
07/34.964 
07/34,965 
07/34,956 



15 



US 
Vector" 



Sonia 
patent 



CuterTnan is n.-;ned as a joint inventor on 
IJo. 4,745,056 { "Strcptcn-.yces Secretion 
and on Scr. No. 21,463. 



20 None of the Ladnor or C'Jterr:an patents or 

applications is believed to disclose or sug-est Che 
present invention, but it is rccucsted thut eich be 
' considered by the Examiner. 

25 So adnission is made that any cited rc-fcrence is 

prior art or pertinent prior art, and the dates given 
are those appearing cn the reference and r:ay not be 
identical to the actual publication date. 
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SUMMr\RY OK THK INVENT r ON 



This invention relates to the cons tr-JCt ion, 
ex-prcr.siicn, and selection of nutated genes that specify 
novel proteins with desirable binding properties, as 
35 well as these proteins thcmnolvcs. The substances 
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5. 

bound by there proc^lns, hereindfter referred tc as [ 
"cargccs", nay be, buc need not be, proteins. Targets ^ 
u\ay include other biological or synthetic p. 
nacronolccules well organic and inorganic t;' 

5 r.oiecules. 

r *" 

The novel binding prczieins r.ay be obtairori: 1) by 
. -''i ... V" 

.i'J r.ucating a gene encoding a known binding protein within 

the subsequence encoding a kno'-^n binding dor.aln, or 2) 

10 by taking such a subsequence of the gene for a lirst 

protein and ccrJaining it with all or part of a gene for 

.. a second prctein (which cay or ir.ay not be itself a 

' >:nc-*n binding prctein), J) by nutating a gone encoding j;... 

a protein which, while not possessing >:nown binding r^- 

15 activity, possesses a secondary or higher structure 

that lends itscl: to binding activity (clefts, grooves, 

e tc. 1 . or <; ) by r.utating a gene encoding a kmwn 

binding c rote in our. not in the cubsequenco kncvn to 

cause the .bir.iinr?. The protein :"rc- vhich the novel 

20 binding protein is derived- need not have any specific 

affinity for the target r.atcriol. 



In one er.l:r;d iir.cn t , tlic invention relates tc: 

25 a) prepari:*.g a variegated pc?uiati:>n of rcplicaole 

genetic p-:cV:a'^es, each package including a nuclnic 
acid cm'-truct coding on ex'prcssicn for an outer- 
surf aoe-c i s p 1 a ycd potential binding protein 
corp rising (i) a structural signal directing the 
:0 display of the protein on the outer surface of thi; 

package and (ii) a potential binding dona in for 
■ji binding said target, where a plurality of 

different pott^ntial binding dor.w^ins are displayed 
by the individual packages; 

'35 
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The invert icp. further rel<itos to a -ct^-^d oJ: 
preparing a -ixed pcculanion of replic.^bie aenotlc 
packages in- which each pocKaqe includes " a gene 
expressing a potential binding protein in such a -anner 
5 that the 'protein is presented on the outer surface of 
the paci:agc. This -ethcd conpr :r.cs: 



i) preparing a variegated pr.pulaticn of Cr<A 
inserts of each of which cor.prises a first 
sequence which codes on expression cor a potential 
binding donain and, a second sequence encoding 
signal directing that the encoded protein bo 
displayed on the outer surface of a chosen 
replicable genetic package, ur.i 
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20 



ii) incorporating the resulting populir.icn c: DNA 
constructs into the chosen rcpliciible genetic 
packages to prcJuce a population of retlicablc 
genetic packages. 

In a preferred e-bodi-cnt, the potent ia l-o ir:i ing- 
prctein-encoding inserts are incorporated iito a gene ' 
encoding an cuter-surface protein of t::c replicable 
genetic package. 
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The invention enjor.passcs t>;e design and synthesis: 
of variegated DNA '-ncoditig a far.ily o: pctenti--^.! 
binding proteins characterized by constant ar;d variable 
regions. said proteins being designed with a viev 
30 toward obtaining a protein that binds a pre- ie t err?. mod 
target. 

For the purposes of this invention, the tern 
"potential binding prctoin" refers to a protein enc=dcd 
35 by one species of D::a r.olccule in a pop-ilation of 
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The invent, icn Cu:--hcr reUt-s to a zGt^.-.d of 
preparing a nixed pcpuUcion of repli.cable genotlc 
packages in which each package inclucies a gone 
expressing a potential binding protein in such a -anner 
that the 'protein is presented on the outer surfp,ce of 
the pacl-.agc. This r.ethci conpr:r.c:;: 

i) preparing a variegated pc-puiaticn o? CNA 
inserts of each of which comprises a first 
sequence which codes on expression cor a potential 
binding donain and, a second sequence encoding 
signal directing that the encoded protein be 
displayed on the outer surface of a cT^.n^icn 
replicable genetic packog?2, und 

ii) incorporating the resulting popul^ricn c: DSA 
constructs into the chosen rcplic-ble genetic 
packages to produce a population of replicable 
genetic packages. 

In a preferred er.bodi-cnt, the pc.tsnt ia i -= ir.i ing- 
prctein-encoding inccrts are incorporated into a gene 
encoding an cu ter-sur: 3cg protein of the r.= pl:c3blc- 
genetic paOiage. 

The invention enjcr.passes t>;e design and synthr.sii 
of variegated DNA ^needing a far.ily o: pctenti--^! 
binding proteins cha r.^c tc r ized by constant and variable 
regions, said proteins being designed vi'h a vicv 
toward obtaining a protein that binds a pr'=i«ter7».ircd 
target. 

For the purposes of this invention. the tern 
"potential binding prctoin" refers to a protein encoded 
by one species of DNA r.olccule in a population of 
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protGins th.it in fact bind to the targ^^t ( "success £u I 
binding do.-na ins" ) . Atter one or more rounds of such 
enrichment, one or norc of. the chosen genes are 
examined and ^equcncid. If desired, new -loci of 
5 variation are chosen. The I'.clected daughter genes of 
one generation then boccric trie parent sequences for the 
next generation of v.- r iog j f.r-J DNA , beginning the next 
"variegation cycle." Such cycles are continued until a 
protein with tne dosircd target affinity is obtained. 

10 

The appended claims are hereby incorporated by 
reference into this specification as an enu.-neration of 
the preferred er.bod inents . 

15 BRIEF UF.r;ciiIPTION OF THE OUA^UICS 

Figure I is a scherratic sho'-fing the relationships 
between various types of Binding Donains (3D). 

20 Figure 2 is a flow ch^irt showing tlie pajor steps urod 
CO create a novel protein with affinity for a pre- 
determined target. 

Figure 3 is a stereo view of a molecular r.odel of the 
25 ' coat of the bacteriophage fl. 

Figure 4 is a sche.n.atic ot a PCD contacting a molecule 
of target . rateria 1 - 

30 Figure ti is a stereo view ot a hypothetical interaction 
between BPTI and nycglobin. 

Figure 6 is a schcr.atic cf tho binding surface of a PiiD 
at various stages in the process of selecting a 
35 successful binding dor.ain for a hypothetical target. 



proteins thot in f^ict bind to the targtt ( *'succcs.s f u L 
binding domainc"). Atter one or more rounds of such 
enrichraent, one or pore of the chosen genes are 
examined and :iequcnco:d. If desired. neu loci of 
5 variation are chosen. The •.-.elected d.iughter genes of 
one generation thon boccne the parent setiuenccs for the 
next generation of v.iricgjt-.r,'J DHA, beginning the next 
"variegation cycle." Such cycles are continued until a 
protein with the desired target affinity is obtained. 

10 

The appended claims are hereby incorporate<l by 
reference into this specification as an enumeration of 
the preferred e.T.bod inents . 

15 BRIEF uFriciuprrori of t]ie ni:Aw:iic5 

Figure 1 i.s a r.cheratic showing the relationships 
between various types of Binding Domains (BD). 

20 Figure 2 is a flow ch-jrt showing tlie najor steps ured 
to create a novel protein with affinity for a pre- 
determined target. 

Ficure 3 is a stereo view of a molecular r.odel of th^^ 
25 -coat of the bacteriophage fl. 

Figure 4 is a schematic ot a PCO contacting a nolecule 
of target ratorial. 

30 Figure 5 is a stereo view oC a hypothetical interaction 
between BPTI and nyoglobin. 

Figure 6 is a cchcr.atic of the binding surface of a PI-D 
at various stages in the process of selecting a 
35 successful binding dorain for a hypothetical target. 
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BacTorlal Cclli as Genetic PacKaqo-j 
Preferred f:ac::erial Cells for Ui;c as CPs 
Preferred Ou::cr Surface Prccoino 
Dvsplayinq IrBDs on Jlaccerial Cells 
CMoico of :r\sortion site for .IP30 in 
Dacceri^ii Cell OSF 

In Vivo i;electicn tor P::;eudo OGP Cene r ror. 
Rar.d-J.-Ti tJt.'A Irserts In Oactericil CeiU. 

Displayinrj IPSO on bacterial spores 

Preferred Bacterial Spores for Use as CPs 

Preferred Outer-Surface Proteins for 

Displaying IP30 on b.icterial Spores 

Cncice of Insertion site for IPQD in OS? 

In Vivo L;elcctior. for Pseudo OSP Gene frcr 

Randcrn OUA Inserts in Bacterid Spores 

DiSpUyin-i IP2D on Outer Surface oC Phages 

Preferred Ph^':;es for 'Jue as CPs 

Preferred cn?'^ fcr Oi:;playing IPQDs on Vi-a-rjoc 

Cnoice of Insertion cite for IPBD in OS? 

In Vivo S^r l^-ticn for Pseudo-OS? Cene fror. 

Randcn Lir.'A Ir.r-crts in Pnacjes 

Choice of rr-no 

Influence t.irqot size on choice; of 1=20 
Influence: of target charge on choice of IPlir: 
Other c^inr. icorat ions in the choice of I?bO 

Choice of CCV 

Hc'Signing the c^c-J _nhd gene Insert 
Ce.net ic r*-";ulation of the osp-inhd gene- 
ra:; A seqjonce design 
.':pccific D::a seque.-ico assignment 
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Bacterial Cell J as Genetic Packaqo-J 
Preferred f:ac::crial Cells for Ui;e as CPs 
Preferred Outer Surface Proteins 
Di.splayinq I.-SDs on clacteriul Colls 
Choice of Insertion site for .IP3D in 
Bacteri'il Cell OSF 

In vivo lioleccicn Cor Pr^eudo 03 P Gene t ro- 
p.ar:-J-j.-:i [ji:a Inserts in BdCterial CeUf. 

Displaying IPSO on bacterial spores 

Preferred Bicte ria 1 . Spores for Use as CPs 

Preferred Ou te r- ;5u r f a ce Proteins for 

Displaying IPBD on B.icterial Spores 

Choice of Insertion site for IPDD in OS? 

In Vivo i;elcct:ion for Pseudo QSP Gene frcr 

Random OtIA Inserts in Bacteriiil Spores 

Displaying IP5D on Outer Surface of Phages 

Preferred Fh=,qes for Uue ai Grc 

Prorerrcd CHPs for Di::pl3ying IPDOs on Vi.j'j-ec 

Choice of Insertion cite for IPno In OSP 

.In ZilLQ f^^ io-t icn for Pseudo-OS? Gene f ror. 

Frtndo.T 0::a Ir.r^crts in Pnacjes 

Choice of ir-20 

Influence o£ tnrqet size on choice of IP^^H 
Influence: of tarqot charge on choice of IPtir: 
Other conr. idorat ions in the choice of l?bO 

Choice of CCV ■ 

fIc'S iqni n-.] the c^cj:_Lnhd qene Insert 
Genetic r»-";ulation of the psp-i aM gene 
r;::A sequence design 
.':pccific r.NA sequence assignment 
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13.1.2 Ttic Sccondat.-y :Jct 

13.1.3 Cncics of Rc:;Lducn to Vary Initially 

13.2 Choosinq ranqc of v.2rirttion 

13.3 Design of vq DNA Encoding I'J^O Family 

L4.1 Insertion of Gynthetic vgDMA into plasmido 

14.2 Transformation of c^lis ■ 

14.3 ■ Growth of the CP(vrjPBD) population 

15. isolation of cr(StJD)s with b ind inq-to-t^ r^i^^c 

phenotyper; 

15.1 Attaching th- target material to a column 

15.2 Reducing selection due to non-specific 
bind ing 

15.3 Eluting the column 

15.4 Recovery of paclcagGs 

15.5 Ar.plifying the enriched pjck.-^gcs 

15.6 Determining whether further cnricnment ir, 
needed 

15.7 ' Charactferi/.inq poptilation 
15.3 Testing of D ind inq . a f f in ity 
15.9 Other Affinity Separation l-leans 

16.0 Thd t:ext Variegation Cycle 

17.0 OTHER con'SIDER.\t:o::s 

17.1 Joint seloctio.is 

j_7^2 Selection for ron-bindinq 

17.3 selection of PDDs for retention of structure 

17.4 Created binding proteins :iGt unigv-- 

17.5 Other nodes of nutjqcnesis possible 
Example 1 Derivation of Novel Binding Protein tor 

Myoglobin Using DI^I as IPSO, ML3 as CP, -.r.-J 
the Cone VI 1 1 protein as OSP. 



13.1.2 The .SccondaL"/ .'Jet 

13.1.3 Cncics of Ko:;i*JucG to Vary Initially 
1.3.2 C^oosinq runqe of v^rirition 

13.3 Design of vg Dt'>k Encoding I'.'^D Family 

1^4.1 Insertion of cynthetic vqDNA into plasraids 

14.2 Transformation of colls 

14.3 ■ Crovth of the CPCvfjfDD) population 

15,' Isolation o f , CP ( G UD) s with b i nd inq-to-ta r'-joc 

phenotyper? 

15.1 Attaching the target material to a column 

15.2 Reducing selection due to non-specific 
b ind ing 

15.3 Eluting the column 
15.*; Recovery of paclcages 

10.5 Ar.pl if y ing the enriched pjck.igcs 

15.6 determining whether furtho.r er.ricnm^;nt is 
needed 

15.7 " Character i;: ing population 

15.8 Testing of binding affinity 

15.9 Other Affinity Separation :<eans 

16.0 The: t.'ext Varicgcition Cycle 

17.0* OTHER cof;srcER.\Tio::s 

17.1 Joint selections 

17.2 Selection for non-binding 

17.3 Selection of P3Ds for retention of structu.-c 

17.4 Created binding proteins not uniqv-"^ 

17.5 Other nodes of mutaqcnesis possible 
Example 1 Derivation of l.'ovcl Binding frotoin tor 

Myoglobin using nin"! as IPBD, 1113 as CP, -m-J 
the Cone VIII F-'rotein as OSP. 
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bind a chosen target, it is referred to heroin as a 
"binding dotnoin" (BD). A prelirninary operation is to 
engineer -Che appearance of a stable pro.cin domain, 
denoted as an "initial potential binding'- do-.T,ain" 
(IPDD), on th*i surface of a genetic packagn. Ti'c 
present" invention is concerned vich z^o oppression of 
numerous, diverse, variant "potential binding doaams" 
(PBD), all related to a "parental potcntxaL binding 
domain" (PPBD) such as the bindinq do:nain cf a V:novn 
binding protein, and with selection and ar.pl i f icatior. 
of the genes encoding the r.ost successful -utant TDCs. 
An IPDD is chosen as PP30 to tMc tirsc round of 
variegation. scLection-through-bind ing isolates one or 
^ore "successful binding dorna ins" ■ ( SSD) . An 5BD fror. 
cne round' of variegation and select ion-t- rough-bind ing 
is chosen to be the PPDD for the ncx^: round. The 
invention is not, however. United to prc-_cirs wich n 
single tiO since the incthod nay be appli<^d -o any cr ail 
of the BDs of the protein, sequentially cr 
simultaneously. The relationships of the v^r.ous BDs 
are illustrated in Figure 1. 

Conventionally, criA sequences ar- -^ritt-r^n fron =' 
to" 3', left-to-right ihowinq cnly t:.e seque.K-e th'.t 
viU appear as nRNA (vith each T of DNA ch.:nnod to U in 
nRNA) . 

protein: l-t - u - F - 

anti-sense DMA: 5' ATG CTT TTC ... V 
sense DNA: 3' TAG C^J-. /-vAC ... 3 

mRNA: 5' AUG CCC UUC ... 3' 

The coir.plccentary strand is the one uccd as tcr.piate 
for rr.RNA synthesis and so is called the "scn::e strand"; 
we will use this convencion throughout. Although this 
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bind a chosen cargcc. it is referred to heroin as 
"binding donain" (BO). A preliminary operacion is to 
engineer -the appearance of a stable proicin domain, 
denoted as an "initial potential binding do.Tiain" 
5 (IPDO). on the surface of a genetic packagn. TI-.o 
present invention is concerned vith t.^o e>:prossion of 
numerous, diverse, variant "potential binding domains" 
(PBO), all related to a "parental potential binding 
domain" (PP3D) such as the binding do.nain of a knovn 
10 binding protein, and with selection and ar.pl i ficat ion 
of the genes encoding the r.ost succe-^sful -utant PDCs. 
An IPDD is chosen as PPBO to the tirst round of 
variegation. Select ion-through-b: nd ing isolates one or 
more "successful binding doma ins" ■( 520) . An SBD fro.T. 
15 one round' of variegation and select ion-tr^rough-bindirg 
is chosen to bo the PPBD for the nex- round. Tao 
invention is not, however. United co prctcirs vith n 
single UO since the method may be AE-.plied -,o any »r ail 
of the DOS of the protein, sequentially cr 
20 simultaneously. The relationships of the venous ODs 
are illustrated in Figure I. 

Conventionally, 0!IA sequences ar- uritt-r^n from 
to' 3', left-to-right showing only zUe seque.ice th-^t 
25 will appear as n.RNA (vvth each T of DNA ch.-nqoJ to U in 
r.RNA) . 

protein: M - - F - 

30 anti-sense DNA: 5' ATG CTt TTC ... r 

sense D:*'.^: 3' TAC C*^.^. />-AG ... 5' 

mRNA: 5' AUG CL'i; UUC ... 3' 

The con'.plcr.cntary strand is the one ujcd as tcr.pidte 
for m.P.r.-A synthesis and so is called the "sen::o strand"; 
wo will use this convention throughout. Although this 
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r.he analyte can be freed trci the affinity r.iaterial 
once Che icpurities occ- --/aD^od avay. 



Affinity colur.n -^:'o-^i toqraphy involves chemically 
5 attaching tno a:f:r.;-.y natoriai to en inert solid 
support ratrix that u held in a ccluan so that 
solutions can be passed over the r.atrix in a controlled 
way. Mixtures that aight contain the analyte are 
passed over the nacri>: to which any analyte conpcnent 
10 in the nixturo adheres. Separation is achieved by 
passing a gradient o: sere type ovor the natrix and 
collecting fractions. It icalso possible to recover 
purified material frrr. the r.atrix by other r.eans after 
iizpurities have been -jshc-l avay. 

15 

An alternative to colu.-n affinity chronatography 
is batch eluwion frcr: ^r. affinity r.acrix catcrial held 
in sone container. Affinity r.aterial is cha.aica 1 : y 
bound zo the r.a.rix. A fixture thar. niqhc concain t:ie 

20 analyte is izdod an:: tho r-itrlx is rinsed vith buffer. 
The material is ri.-.sed with a series of buffers 
containing increasing concentrations of solutes chcsen 
- to wash i-?urities a-ay. The analyte is recovered in 
purified fcr:n either i.^ cne of the buffer iractions or 

2 5 bound to the matrix. 



Another alternative to cclu-n affinity chrona tc;- 
raphy is catch cluti^n fro- a plate. The affinity 
.•material can be ch£r:cally bound to a flat surCaco. 
such as the cotter, cf o polystyrene petri dish. A 
inixture th.-».t eight contain the analyte is aadcd to the 
plate and the plate is rinsed with a buffer. 
Subsequently, the plate is washed with a scries of 
buffers containing i.ncrcasinq concentrations of solutes 
chosen to separate co.-pcnent:i h.-iving lower affinity for 
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r.he anAlyte can be freed frc.-: Che affinity naterial 
once the impurities are -jos^^cd avay. 

Affinity colur.n c^.:•o-^l toq raphy involves chemically 
Attaching tno a:fir. :ty material to inert solid 

support r.itrix that i:; held in a ccluan so that 
solutions can be passed over the r.atrix in a controlled 
way. Mixtures that aiqht contain the analyte are 
passed over the natrix to which any analyte conpcnent 
in the nixturc adheres. Separation is achieved by 
passing a gradient of sc-e type over the raatrix and 
collecting fractions. It is also possible to recover 
purified material frrr. the rr.atr:x by other r-eans after 
iTzpurities have been vjshcd away. 

An alternative to col*j-n affinity chronatography 
is batch elution frr::: in affinity r.atrix materia 1 held 
in sor\e container. Affinity -aterial is chejnicaliy 
bound to the r.atrix. A nixfjre that niqht contain the 
analyte is a=dcd ar.l the r':itrix is rinsed with buffer. 
The nacerial is rinsed with a seri-2s of buffers 
containing increasing concentrations of solutes chcsen 
to wash i.T.o'jrities avay. The analyte is r.2covered in 
purified form either in cne cf the buffer fractions or 
bound to the raatrix. 

Another alternative to cclu-n affinity chronatc;- 
raphy is catch cU:tion fror. a plate. Tho aftinity 
r:aterial can be cherically bound to a flat surface, 
such as the =ottc.- cf a polystyrene petri dish. A 
=iixture th.-».t r.ight contain the analyte is aadcd to the 
plate and the plate is rinsed with a buffer. 
Subsequently, the plate is washed with a series of 
buffers containing incrc-;\sing ccncentra t ions of solutes 
chosen to separate co.-pcnents h.iving lowor affinity fcr 




or cells. It has been used to separate bac ter iophd'^es 
on the basis' of charge. {SERW37) . 



The present invention makes use of affinity 
5 separation of bacterial cells, or bacterial viruses for 
other genetic fjac>-.ages) to enrich a population fcr 
those cells or viruses carrying genes tha^ code for 
proteins with decirabie binding properties. 

xo In the present invention, the words "grow", 

"growth", "culture", and "amplification" r.ean increase 
in number, not increase in size of individual cells or 
phage. In the present invention, the words "select" 
and "selection" are used in the genetic sense: i_:_e^ a 

15 biological process whereby a phenotypic characteristic 
is used to enrich a population for those organisms 
displaying the desired phenotype. Choices or elections 
to be m'ride by huntans are indicated by "choose", "pic>:", 
"take", etc., but not "select". 

20 

The process of the present invention comprises 
three major parts: 

I. design and production of a roplicible 
25 genetic pacK.^ge (CP) that displays an IPSO on 

the surface of the CP; the combinacicn is 
denoted CP(irED) . 

II. design and implementation of an affinity 
30 separation process that separates C?(:P5D)s 

that bind to a known affinity molecule from 
wild-type CPs or CP(IPDD")s, neither of which 
binds the known affinity nolecule. and 



30 



wild-type CPS or GP{TPDD')s, neither of vhich 
binds the known affinity nolecule. and 



or cells. It has been used to separate bacteriophages L 

on the basis of charge. (SERwa?). - 

The present invention makes use of affinity |--.^ 
separation of bacterial cells, or bacterial viru^^es (or 

other genetic packages) to enrich a population fcr j;-' 
thosc cells or viruses carrying genes r.ha- code for 



proteins with desirable binding properties. 

iO In the present invention, the words "grow", 

"growth", "culture", and "amplification" r.ean increase 
in number, not increase in size of indiv:di:dl cells or |,:.^ 
phage. In the present invention, the vords "select" 
and "selection" are used in the genetic sense; i^ a 

15 biological process whereby a phenotypic characteristic 

is 'used to enrich a population for those organisrvs ^ 
dioplaying the desired phenotype. Choices or elections ^ 
to be made by humans are indicated by "choose", "pick". 
"take", etc., but not "select". {: .. 

The process of the present invention comprises - 
three major parts: r" 

I. design and production of a i-epllcible 
25 genetic u.ick^ge (CP) that displays an on 

the surface of the GP; the combinacion is 
denoted GP(IPED) , 

II, design and ir.plementation of an affinity j.;;; 
30 separation process that separates C?(:pBD)s |^ 

that bind to a known affinity molecule from ^■ 
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3) designing an ar.ino acid r.ocuorc- that: a) 
includes the IPEO as a subsequence and b) will 
cause the IPDO to appear cn the CP surface (Sees. 
1.1.2, 1.2.2, 1.:.2, cind 4) , 

4i engineering a gen-, d-nctcJ 2r.ii_-^?a. th.^c: a) 
codes :or the designed anir.o ac;d -j-qucncc, b) 
provides the necessary gc.-.ecLC regulation, and c) 
introduces convenient sites for genetic 
nanipulation (Sees. 4.1, 4.2, 4.3, '5-1. and S.2), 

5) cloning the o^o-ir^d gere intc the CP (Sec. 
6.1), and 

6] harvesting the transfcrr.ed CPs (Sec. 7) and 
testing the.-. fcr presence of IPBD on the C? 
surface (Sec. 3); thU test is pc--or-.ed with ^.n 
affinity rolecule having high af-mty Ccr IPED, 
denoted Af:-:(I?aO) - 

rn another preJerrod e-=od:-ent, P-^rt : c: the process 
involves : 



1) choosing a G? sucn -js 



bacteri'U cell JStc. 
1.1.1), bacterial spcre fl.2.1). cr phage (1.1.1) 
having a suitable cuter surface protein (Sees. 
1.1.2, 1.2.2 and 1.3.2j , 

2) choosing a stable IVZO (Sec. 2). 

3) designing a 0:;a sequence thAt: a) encodes the 
IPDO as a subsequence and b) ccr.r^ins suitable 
restriction sites so that randcn USA m-^y be 
operably linked to the jn^ gone fragr.ent; and c) 



3) dcsigninq an acino acid nc-q*jcrc2 that: a) 
includes the IPBD as a subsequence jnd b) will 
cause the tPDD to appear cn the CP surface (Sees. 
1.1.2, 1.2.2. 1.3.2. eind 4) , 

4) enq-.neerinq a <jcn-. dC:nctcd cn^^jji^d- tU.^t: a) 
codes for the dosic;.-.Gd a.ni.-o acid -j'-'q-ioncc . b) 
provides the necessary gcr.etic rcquLation, and c) 
introduces convenient sites for qenecic 
nanifjulation {Sees. 4.1, 4.2. 4.3, . I , and 5.2}. 

5) cloninq the O30-ir^d cer.e incc the CP (Soc. 
6.1}. and 

6) har.-esting the transfcrr:ed CPs iSec. 7) and 
testing thc.'n for presence of :P30 on the G? 
surface {Sec. 3}; this test is po.-fcr-.ed -rith cn 
affinit/ r.olecule having hiqh affinity for IPED, 
dc-noted Afr.( IPSO} - 

In anot.her preferred er.bod 1 r.Gr.t . P-irt 1 c: the process 
involves : 

1} choosing a C? seen a bacteri.U cell iSec. 

1.1.1}. bacterial sp^re {1.2.1). phage (1.3.1) 
having a suitable outer surface prctein (Sues. 
1.1.2. 1.2.2 and 1 . 3 . 2 j . 



2) choosing a stable IriO (Sec. 



I') 



3) designing a 0::a sequence th.-^t: a} encodes the 
IPSO as a subsequence and b] cc-.-'^lns suitable 
rcstriccion sites so that randon UNA n-v/ be 
operably linkcJ to the incd gone frag-ent; and c) 



rj 
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don.Tin. References co PbO cr £iJM in Part 
indicate a preparatory ihtcnc. 



I are to 
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In Part 11 wo cptiniso separation of GPCIPBO) frcr. 
wild-type CP, denoted w-tCP, taced on the affinity of 
IPtlD for AfM(It'DD). To c^Z'.tlis^ the sensitivity nf 
the affinity C(.>paracion process, ue separate sr^o 1 1 
anounts of GP(IPBD) rro.-n r.ucn larqer anounts of wtGP. 
In a preferred enbodir^ent, P^rt 11 of the process of 
the present invention involves: 

1) preparing affinity colur.ns bearing AfM{IP90) at 
various densities of a f { : ?3D) / ( volu-e of .-natrix) , 
(Sec. 10-1), 



2) preparing GPCIPBO) 
IPBD per GP, 



s with various amounts 



20 



2 5^ 
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3} picV;ing a gradient reijir.e fcr eluting the 
colufr.ns (Sec. 10.1) . 

4) determining which ccr.cination of: a) IPBO/GP. 
b) density of Af ( 1 :-'BD) / C'/alume of support), c] 
initial ionic strength, d) eLution rate, and e) 
(ar.cunt of GP)/(vclur.e of support) loaded, givns 
the best reparation cf C?(I?nO) f rcr. utCP (Sec. 
10. 1} , 

5) determining the sr.allest ar.ount of CP(:?bO) 
that can be isolated fro.-:i a rr.uch larger amount of 
wcGP using the opcir.al ccr.diticn. (Sec. 10.2). and 

6) detorninir.g the effici2nc.y of the affinity 
separation procedure (Sec. 10.3). 




I- : 



- i 
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domc\in. References co PbO cr pc^ in Part I are to 
indicjto a preparatory intent. 

5 In Part II we cotinizc separation of G?(1PBD) frzz 

wiLd-ty;>e CP, denoted wtCP. based on the affinity of 
IPno for AfM(It'DD). To csz-^tiist^ the sensit.ivity of 
tihc affinity sctparacion process, we separate sr^oll 
anounts of GP(IPBO) :*ro:n r.ucn larqer anounts of wcC?, 
10 In a preferred cr:ibod inent , P^rt 11 of the process of 
the present invention involves: 

1) preparing affinity colur.ns bearing AfMCIPBD) at 
various densities of A f M ( I ?3D) / Cvolu::;e of matrix), 

15 (Sec. 10.1), 

2) preparing C?(:PBD)s with various amounts of 
IPQD per GP, 

2 0 3) picking a qr*-\6ient rog;r.e for o luting -r.e 

colur.ns {Sec . 10.1), 

4) deterinining vhich ccr.c ina tion of: a) IPBO/CP, 
b) density of Af ( 1 :-'B0 ) / C'/oi^-'^G of support), cj 
25- initi.-Al ionic scrcngrh, d) elution rate, and e) 

(air.cunt of GP)/(vclur.e of support) loaded, givrs 
the best repartition cf C?{IPnD) frcr. wtCP (Sec. 
10.1), 

30 5) deternining the sr.allest ar.ount of CP(I?bO) 

that can be isolated fro.-:i a cuch Larger amount of 
wcGP usino tl:e optir.al ccr.-Jition. (Sec. 10.2), and 

6) deternining the efficiency of the affinity 
75 separation procedure (Sec. 10.3). 
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i; pic'y.ir\q ^ sec of rjovoral ror. idjc? in the PPHD 
CO vary: the pr Inc ip-^ ^ inriicjcors of --^hich 
residues Co vary include: a) chc 3D structure of 
the b) sequences ot homolofjcus protein:;, and 

c) computer or theoretical nodclinq '-.hat indicates 
vhich rer*;ducs can tolcC'ite different anino acids 
ui trout dluruptinq the underiyinq structure {Ciec. 
i:. 1) , 

<) pLcy.inq a subset of the residues picked in Part 
III. 3, to to varied s ir.u 1 tanccus 1 y (Sec. 13.1); 
the principal considerations are the number of 
different voriancs and '-hich variants are within 
the detection capjbiLitics of the atfinity 
separation dotorninccJ in Part 11. and scttinq the 
rar.cje of variation (Soc. 13.2); 

0) ir.plcr.cnt inq the var Legation by: 

a) synthcG icing the part of the o?rv-£}hri fyene 
that encodes the resi'JU'^s to be varied using a 
specific nixcure of r.uclcot Irie substrates tor 
some or all -of the bases unc-::iiriq residues 
slated for variation, thereby creating a 
population of OI.'A -olecules, denoted vgDNA 
(Sec. 13.3), 

b) li'vating this by standard ir.othods, 
inUo the operative cloning vector (OCV) ( e.g. 
a plasnid or bacteriophage) (r;oc. 14.1), 



c) u:;inq the 1 -gated ON'A to transform ceils, 
thereby producing a population of trans forrr.ed 
cells (Sec. I** .2) . 



AbbrGviation 
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Mef^n ma 



GP 



Genetic Package, e.g. a 
bac::er iophage 



wtGP 
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IPBD 



P3D 



S3D 



PPBD 



OSP 



OSP-PBD 



OSTS 



Wild-type GP 
Any protein 

T^ie cjcne for protein X 

Initial Potential Binding 
Dor,ain, e.n. BPTI 

Potential Binding Oonain, c.q. 
a derivative at BPTI 

Successful Binding Domain, 
e.g. a derivative of BPTI 
selected for binding to 
ta rcct 

Parental Potential Binding 
Do-ain, i.e. an IPDO or an S30 
t*rcn a previous selection 

Outer Surface Protein, e.g. 
coat protein of a phage or 
Lani3 from col i 

Fusion of an OSP and a FoD, 
order of fusion not specified 

Outer Surface Transport Sigr.al 
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CP(x) 



C?(X) 



CPf osp-obd ) 
CP{OSP-PBD) 

C?(Ebd) 
CP(P3D) 

(Q) 



Xn.'DUCE 
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A genetic paclcaqe cor.taining 
the X gene 

A genetic package that 
displays X on its outer 
surface 

CP containing an osp-pbd gene 

A genetic package that 
displays PBD on its outside as 
a fusion to OSP 

C? containing a gbd gene, os_e 
ir.pl icit 

A genetic package displaying 
P3D on it:; outside, OS? 
unspeci f led 

An affinity r.atrix supporting 
"Q", e ■ n. (T*; lysozy-c} is 
lysozyr-.e attached to an 
affinity matrix 

A r.otcculc having affinity for 
"W'\ ^^SL^ trypsin is an 
AfMCBPTI) 

AfM(W) carrying ^ label, e_.ji... 
125i 

A Chemical that can induce 
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Abun (X) 



OMP 



s?-r 
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expression o: a qcne, e.g. 
I PTC for zr.e b-\c'.JV5 prcr.oter 

Oper.itivc Cloning Vecccr 

K-r :T) ;33C:/:T:5tiD] (T is a 
targec) 

Kfj = [NirsnD]/(N:SBD] (M is a 
non-carqec; 

Donsity ol A;y.(W) on affinity 
r.atri>: 

Kost-r.ivcre^ ar. i.no acici 

Lci^.sc-F.^vcrc'j a.-:iir.o aciJ 

Abund.-ince of 3N"A r.olecules 
encoding arar.o Acid x 

Oucer -e;r.br^r:e procein 

nuclcctiic 

A bir.olccule.r ci ii;soc iac ion 
constant, = r A i r = ) / r A : 3 ) 

S ignol-sec;uer.cc Peptidase I 



Yield oc ssCr.A up to Q bases 
Ion^7 

Maxinun l*:-nq:Ln of :-.sDNA that 
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Q 



^ntv 



-eff 



can be synthesized in 
acceptable yiGlci 

Yield of plas.-nivi ON'A per 
volur.e of cuLtu'f' 

Cr.'A ligatic- crricicncy 

Maxinum nunber of 
transfornvancs produced fron 
YdioO °' Insert 

Efficiency o: cnr'j.-r.AtogracMc 
enrichnenc, enric^rient per 
pass 

Sensitivity cf c.nrorr.atcqr.Tphic 
separation, can :!.nd 1 '.n 

.'-iixir.um nur-cer o; enrichment 
cycles per varieqatior. cycle 

Error level in synthes i = incj 
vcd:.*a 



Sgc. 0.3: g:tan-:l?.rd spfruonc i ng method : 

The present invention is not lir;itcd to a single 
method of deternininq the sequence of nucleotides (nts) 
in DMA subsequences. In the preferred ecbodlrrent, 
plasrnids are isolated and denatured in the presence* of 
a sequencing prir.er, about :o nts long, that anneals to 
a region adjacent, on the 5' side, to the region of 
interest. This plai:mid is then used as the tecpKite in 
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the four sequencing reactions uich one dideoxy 
substrate in each. Sequencing reactions, agarose gel 
electrophoresis, and polyac ry lamide gel electrophoresis 
(PACE) are performed by standard procedures (AUSLTS?). 

The present invention is not limited to a single 
method of dcternining protein sequences, and reccrence 
in the appended claims to determining the amino acid 
sequence of a domain is intended to include any 
practical method or combination of methods, vhcther 
direct or indirect. The preferred method, in most 
cases, is to determine the sequence of the DNA that 
encodes the protein and then to infer the amino acid 
sequence. In some cases, standard methods of protein- 
sequence detenaination may be needed to detect ?cst- 
translational processing. 



The. major steps in the process of making and 
isolating a novel binding protein with affinity for a 
chosen target naterial are illustrated in Figure 2. 

9.^^. 1: Soecif ication of Gcnotlc Packa<LG_iQl_^.-2iiI^ 
nisDlavinci h Hote rol nnou-:: Binding Dor^.-iin On r_cs_Oiit fX 
Surface: 

S ec. 1.0: Goneral Rc n'i i ro-rn t s for Genet icJMcjiaces 



It is c-mphasized that the CP on which selection- 
through-bindlng will be practiced must be capable, 
after the selection, either of growth in some suitable 
environment or of Iq vitr o amplification and recovery 
of the encapsulated genetic message. During at least 
35 part of the growth, the increase in number must bo 
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approxinatcly exponential -itn respect to time. The 
component of a population that exhibits the desired 
binding properties nay be quite small, for example, one 
in 10^ or less. Once this component oC the ^populat ion 
is separated from the ncn-binding components, it nust 
be possible to ar.pliry it. Culturing viable cells is 
the most po-erful ,T-pi i f ica t ion of genetic material 
known and is preferred. Genetic messages can also bo 
amplified in vitro , but this is not preferred. 

A CP nay typically be a vegetative bacterial cell, 
a bacterial spore or a bacterial ON'A virus. A 3tr:iin 
of any living cell or virus is potentially useLul if 
the strain can be: 



1) maintained in culture, 

2) affinity separated and retain Its viobility. 

20 3) genetically altered with reasonable facili-y, 

and 



4) manipulated to display the potential binding 
protein coaain where it can interact with the 
target naterial during affinity separation. 



We believe that it is possible to cause a genetic 
package to display the IPBO or P3D on its outer surface 
without adversely affecting the viability of the CP or 
the binding characteristics of the IPSO or PBD. 



It is generally believed that the part of the 
polypeptide chain ccrr.posing one donain folds al.-ost 
independently of the parts composing other domai.is. 
35 There are natural proteins coriposcd of two or more 




anchored the chir.oric protein in r.he lipid biUyer. 
The MilF procoin uas inccr.pleto ond cocld noc function. 

There are two basic nethcds of arranging that the 
5 iobd qene is expressed in such a r.anncr that the IPBD 
is displayed on the outer surface of the CP. 

Kirst, CNA encoding the IPBO sequence nay be 
oper.^bly linked to CtiA encoding all or part of an outer 

10 surface protein (OSP) native to the CP- If one or more 
fusions of fragments of x genes to fragments of a 
natural osfi qene are known to cause X protein domains 
to appear on the CP surface, \:hen ve pick the DfIA 
sequence in which an icbd gene fraqnent replaces the x 

15 gene fragment in one of the successrul Qzn^ fusions as 
a preferred gene to oe tested for the d ispUy-of-IPBD 
phenotype. (The gene r.ay be constructed in any 
r.anncr*) If no fusion data are available, then we fuse 
an inbd fragr.cnt xo various fragr.ents, such as 

20 £rr.g-enr.s that end at known or predicted domain 
boundaries, of the ons gene and obtain CPs that display 
the or.n-inhd fusion on the CP cuter surface by 
screening or selection for the d isp lay-o f - 1 PiJD 
phenotype. The fusion of I^bd and c3d fr.i'-^ncnts may 

25 also include fragments of random or pseudorandori D^iA to 
produce a population, r.cr.bers of which may dicpiay IPE.^ 
on the CP surface. The r.cnbcrs displaying IPBD are 
isolated by screening or selection for the disp:.iy-oC- 
binding phenotype. 



30 



35 



While rr.ost bacterial proteins remain in the 
cytoplasm, others arc transported to the periplasmic 
space (which lies betveon the plasma membrane jnd the 
cell uaU of gr.^m-ncqativc bacteria), or arc coPA-cycd 
and anchored to the outer surfocc of the cell. Still 
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others nrc evporLcd (scor^tcd) inco tho r.edLur:i 
surrounding the cell. Thcje characteristics of a 
protein that are recorjnizcd by a cell and chat cause it 
to be transported out of the cytoplasn and di:;played on 
the cell f^urface will be tcrr.ed "ou to r-sur tacc 
transport . signals" . 

It is belie%'cd that the condici.cr.s for an outer 
surface transport signal rire not particulorly 
stringent, i.e., a random polypeptide of appropriate 
length (preferably 30-100 amino acid^) h-.s a reasonable 
Chance of providing such a signal. Thus, by 
constructing a chir.oric gene co.T.prising a segment 
encoding' the IFBD linked to a scqr.on" cf random or 
pseudorandon DMA (the potential 05TS1 , and placi.ng this 
gene under cc-.trol of a suitable prc-ctor, there is a 
possibility that the chi.-neric protein r.o encoded will 
function as an OSP-IPBD. 

This possibility is greatly enhanced by 
constructing nur.erous such genes. c.jch having a 
different potential OSTS , cloning t!;on into a suitable 
host, and selecting tor trans fornanci: Ijcaring the imc 
{or other tnarKer) cn their outer curfuce. 

The replicable genetic entity (phago.or plasnid) 
■ that carries the or.n-D hd genes (derived from the gyj^z 
i£bd gene) through the e c 1 ec t i on -through-b ir.d inq 
process, see Sec. in, is referred to hereinafter as the 
operative cloning vector (OCV) . When the CCV is a 
phage, it nay also serve as the genetic p.ickage. The 
choice of a CP is dependent in part on the availability 
of a suitable OCV and suitable CSP. 
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PrcforabLy, Cho CP is loadily scored, for example, 
by freozinq. If the CP is a cell, it should hrtvc a 
short doubling time, such as 20-40 tninuccs. If the C? 
is a virus, it should be prolific, e.g., a burst sir.c 
of at least lOO/infectcd cell. CPs which ore CinicKy 
or expensive to culture are disfavored. The CP should 
be easy to harvest, preferably by contr i f ug-it ion . The 
CP is preferably stable for a ter.pcraturo r^nqe of -70 
to 42°C (scable at ^'^C for several days or -ceKi;) : 
resistant to shear forces found in MPLC; i.-.sonsitive to 
UV; tolerant of desiccation; and resistant to a pM of 
to 10.0, surface active agents such as SOS or 
chaotropes such as 4K urea or 2M guanidiniurv 



2 , 0 
Triton 



cor.r.on 



and 



HCl, common ions such as K*, Na*, and 50^ 
organic solvents such as ether and c^cctone. 
degradativc enzymes. Finally, there must be a 3uit.">ble 
OCV (see .^cc. 3) . 



.s. 



Although kno--rledge of specific OSPs may not be 

20 required for vegetative bacterial cells and ' endosporc-J . 
the user of the pr^^sont invention, prcfe.'^bly, will 
V:ncw: Is the sequence of any or,£ Known? (preferably 
yes. at least one required for phage). .:Itov docs the 
OSP arrive at the surface of CP? ( V.nowlcJge of routo 

25 necessary, different routes have different uses, no 
route preferred C-Qjl ■ Oil). the OS? 

post-translationaily processed? (no processing most 
preferred, predictable processing preferred over 
unpredictable processing). What rules are Known 

.10 governing this pcocessinq, if there is any procesiiing? 
(no processing most preferred, predictable processing 
acceptable). what function does the OSP sorvc in the 
outer surface? (preferably not essential). Is the ZD 
structure of an OSP known? (highly preferred). Arc 

35 fusions between fragments of qi:d and a fragmont of x 
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Known? Does exprosoicn of thc-jc- fusions Ic^d to 7. 
appearing on the surface of the CP? (fusion data is a:: 
preferred as V.novlccJgc of a 3D 'Jtructuro) . Is a "2D" 
structure of an 05P available? (in this context, a "2C" 
5 structure indicates which residues are exposed on the 
cell surface)^ (2D structure less proforrod than 20 
structure)'." Whcra are the domain boundaries in the 
OSP? (not as preferred as j 2D structure, but 
acceptable) . Could IPBO gc throufjn the saac process as 

10 OCP and fold correctly? (IFGO niqht n<2cd pros-ohetic 
cfrcups) (preferably IPDD will fold after same 

process) . Is the sequence of an oso promoter known? 
(preferably yes). Is or^n qene controlled by 

rcqulatable promoter available? Jpreferably yes) . What 

15 activates this promoter? {preferably a diffusible 
chenical, such as IPTG) , How r.any different OSFs do wc 
know? (the more the better) . Ifow .^any copies of each 
OS? are present on each packaqc? (-ore is better) . 

20 The user will uant V-.no-- i o'tr;e of the physic^il 

attributes of the CP: How iarqo is the CP? (kr.ovledqe 
useful in deciding how to isolotte CPs) (prcferaoly easy 
to separate frcn soluble protein? such ns IqCs) . Wh^t 
is the charge on the CP? (neutr/il preferred). What is 

25 the sedinentation rate of the CP? (^inowlcdqc preferred, 
no particular value preferred) . 

The preferred CP, OCV and CJ? a:c those for which 
the fewest serious obstacles can bo seen, rather than 
30 the one that scores highest on any one criterion. 

Next, we consider general answers to the questions 
posed In this step for the c.isris of: a) vcgetativcly 
growing bacterial cells (Sec. 1.1), b) bactnrial spores 
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(Sec. 1.2).. and c) (Sac. 1.3). Preferred oSPs for' 
several CPs arc given in Table 2. 

c^^ it: n^^ct e ^i^i rc^Ws as Genetic Pnckages: . 

5 

One may choose any we 1 1-cha racte ci2cd bacterial 
strain which may be grown in culture. The important 
questions in this case are: a) do we >;ncw oncuqh about 
r.echanis:ns that localize proteins on the outside of the 

10 cell, b) will the IPBD fold in the environment of the 
outer meabrane, and c) will cells change expression of 
osp-pbd . derived from osp-ipb^, durino affinity 
separation? Sox.e IPSDs may need large or insoluble 
prosthetic groups, such as hacm or an Fe,S, cluster. 

15 that are available within the cell, but not in the 
medium. The forrriation of Fe4S4 clusters found in sorr.e 
ferrcdoxins is catalyzed by enzyr.es found in the cell 
CBONC35). IPBC3 that require such prosthetiz qroups 
nay f^il to fold or function if displayed on bacterial 

20 cells. 

q^r- 1.1 .1! Prefo:Tod_lV.»ct2r-n1 CnWs as C? : 
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The species chosen should have a well- 
characterized qonctic Gysten and strains defective in 
genetic recombination should be available. The chosen 
strain nay need to bo manipulated to prev.^nc changes of 
its physioloqicrl -Jtate that would alter the nur.ber or 
type of proteins or other r.oicculcs on the cell surrace 
during the affinity separation procedure. In view of 
the extensive knowlcdqo of coU. -"^ strain o: E... 

coil, defective in recombination, is the strongc-jt 
candidate as a bacterial CP. Other preferred 
candidates arc ?^_nonrJJ.A t:i2llls^^isl^, l?.oc_i_LLUP- 
3 5 su bt i 1 is . and nLCm^'^^ErJOil.*: aorng i nosa . 



50 



Induction of synthesis of engineered genes in 
vegetative bacterial cells has been exercised through 
th*^ use of regulated pror-.oters such as lacUVS . t rnP . cr 
tac CMANI32) . The factors that regulate the quantity 
of protein synthesized include: a) pror.oter strength 
(' cf . HOOP87) , b) rate of initiation of translation ( r f . 
GOLDS?) , c) codon usage, d) secondary structure of 
nRNA, including attenuators f cf . LAND87) and 
teminators f cf . YACE37) , e) interaction of proteins 
with rtPLMA (cX^ MCPH3 6, NlLL37b, Wi:iT87), f) degradation 
rates of mRNA ( cf . SUBB33), g) proteolysis (cf^ 
C0TTS7). These factors are sufficiently veil 
understood that a vide variety of heterologous pcozcir.^ 
can now be produced in FL. col i cr Bj. s ubtil is in at 
least moderate quantities (SKER38, EETT03). 

Ccc. 1.1.2: Prf-'Verro-:! Q -. tter 5urfocg rrot ci r.s :j 
Displaying IPSOs on PMcrerial Cells: 

Gram-negative bacteria have out o r-:r.onor jne 
proteins (OM.P) , thut forn a subset of OS?s. r^ny G^.Ps 
span the raembrane one cr r.ore tirr.es. The sign-U-j cha* 
cause OMPs to local i::c in the outer ne-bcanc dre 
e'ncoded in the ino acid sequence of the n.Hturo 
protein. Fusions of fragr.onts of oro genos vith 
fragn;ents of an x gone have led to X appearing on cho 
outer rr.enbrane (nt::;S34, CLEXSl). The rules that govern 
the localization of OMP-.K fusion proteins are not yet 
fully elucidated. ::any c:\?s are polyr.eric and non- 
essential; a non-essential Ql'P is preferred, A ncn- 
essfintial OMP for which there is knowledge ot which 
residues are on the cell surface is aore preferred. A 
non-ossent ial OMP for which there is data showing that 
X is displayed part of an OMP-X fusion is -ost 
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preferred. If no fusion daca are available, t^en we 
fuse an jpbd fragment to various fragments of the oso 
gene and obtain CPs Chat display the osp-.ipbd fusion on 
the cell outer surfcxce by screening or selection for 
the display-of-IfBD phenotype. 

Oliver has reviewed nechan i s.t.s of protein 
secretion in bacteria (OLIV35 and OLIV37) . Nikaido and 
Vaara itilKA&l) have revieved mechanisrts by uhich 
proteins become localiz-d to the outer rembrane of 
Gran-negative bacteria. for example, the L^m3 protein 
of i:^ coll is synthesized with a cypical signal- 
sequence which is subsequently removed. Oenscn et el^ 
{DE:^^.S^4) showed that L^mD-LacZ fusion proteins would be 
depoc-ited in the outer ne-nbrane of coJA when 

residues 1-49 of the tr.ature LaaB protein c^C2 included 
in the fusion, but that r:;sidues l-4i are insufficient. 
The rules that govern local itot ion of proteins in the 
outer r.er.brane cf Grar.-nega t ive bacteria r^r.ain vague. 
Kaiser et aJU- {KAIS37) showed that the export signal in 
saccharorvcey; rf>revisiao is very broad, cec:.J->e when 
they fused randor. hu.T.an DUA sequences to C::a coding for 
mature invertace, about cne fifth of the sequences 
resulted in the appearance of invertaso free in the 
medium. 

The outer no-branc protein LanD of cpJJ. '^^ ^ 
porin for maltose and -.a Itodcx-trin transport, and 
serves as the receptor for adsorption cf bac tc r io;,hages 
la-nbda and KIO. This protein has been purified to 
homogeneity (E'lDETSJ and shown to function as a trir.cr 
(PALV70). Mutations to phage resistance have been used 
to define the ports of the LanB protf.Ln thac adsorb 
each phage {ROA>!30, CLEKSl, CLEM83, GPHRS?). Phage- 
resistance nutations arc dominant suggesting 
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that there is no prc£erontiaL asser.bly of wild-type or 
mutant subunits. 

In I amS "^ cells, addition 'of naltose or 
5 maltodcxtrin inhibits a torn of motility called cell 
swarming, and larr.B r.utants defective in this process 
have been characterized (HE.N38}. These rrutations have 
been sequenced and corr.pared to the wild-type setiu&nce 
(CLEM81) and the concomitant protein domains have been 
10 analyzed (CLEM83). Topological models have been 
developed that describe the function of phage receptor 
and miltodextrin transport. The models describe these 
domains and their locations with respect to the 
surfeccs on the outer membrane (CHAR34, ^iEI^;33) . 

15 

LamB is transported to the outer membrane if a 
functional N-terninal • sequence is present; further, the 
first 49 amino acids o: tne mature sequence are 
reauired for successful transport {BZUS5-i) . Homology 

20 between parts of LafnB protein and other cuter membrane 
proteins OmpC, OmpK and FhcE has been detected 
(UI.'JkBA) ; including homology between LamB amino acids 
39-49 and sequences of the other proceins, Theje 
subsequences nay label the proteins for transport to 

25 the outer mowbrane. Further, monoclonal antibodies 
derived from nice immunized with purified LamB, have 
been used to characterize four distinct topological and 
functional regions, two of which are concerned with 
maltose transport (CA0AU2). 

:o 

General knowledge on processing of signal 
sequences In fU. col i is relevant to the present 
invention both for use of col 1 ^er so and for use in 
conjunction with filamentous phage (vide inlX^) • 
35 Genetic experiments on process=inq of signal scq'-'^.nces 
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.qoc. l.l.T Choir s r.f Tn-^grtion site — fee — } POP Ln. 
P^nrn--\A\ Coll OSP: 



indicate that if the S21-F22-A23 sequence is preserved, 

signal peptidase (SP-I) will cleave after A23 fOLIV37). i 
Many examples hav- been cited in which the OUA coding 

for the leader or signal sequence frora one protein has -- 
been attached to the DMA sequence coding for another 

protein, protein X (DECK83, INOU86 ChaO, CEEC86. [- 

Mu^RKa6. and liOQ'J37). Expression of such a chir.eric . I 

gene often causes protein X to appear free in the f-: 

periplasm. That is, the leader causes the new protein I 

to be secreted through the lipid biUyer; once in the | 

periplasm, it is cleaved off by SP-I. [i. 



Eeckwith (Bi:CK33 and I'u\N086) has shovn that when ? 

the siioA gene is inserted in frame into the coding I 

sequence for an integral mcnbrane protein, for exar.ple j^', 

Mair; that the PhoA domain is locaUr.ed according to [ 

where in the integral membrane protein the ahoA gene C. 

uas inserted. That is. if inserted afCer an ^ 
amino acid which normally is found in the cytoplasm, 

then PhoA appears in the cytoplasm. If fihnA is j: 

inscrt-d after an amino acid normaU-/ found in the i 

periplasm, however, then the PhoA dcmaLn is localized j- 

pn the periplasnic side of the nenhrane, and anchored i-\ 
in it. 

i 

Ecckwith and colleagues fliECKOa) have extended ^ [ 

these obser%-ations to the UicZ gone that can be ^ 

inserted into gen-G for integral membrane proteins such j^. 

nhat the LacZ dor.ain appears in either the cytoplasn or p 

the periplasm according to where the UygZ gene was f; 
insertod. 
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OS?-irnD fusicn proteins need not fill a t; 

structural role in the oiiter nembrones of Cram-negative ^: 

bacteria because parts of the outer -ocoranes ere not j; 

highly ordered. for large OSPs there is li>;Glv to be r 

one or acre sites at which oss can bo truncated and ^ 

fused to ioh d cuch that cells express inq the fusion >. 

will display IPBDs on ttie coil iiurC^cc. It fusions J- 

bctveen fragr.ents of crn and x have jsen shown to {, 
display X on the cell surface, ve can design an osp- 
jpbd gene by substituting inbd for x in the C^^A 

seq^jence. Other^fisc, successful 0.*<P-I?BO fusion is f. 
preferably sought by fusing fragriencs o: the best crnfi 
to an inbd, expressing the fused g^r.e, and testing the 
resultant CPs for d ii^play-of-I PBD pr.^nocype. Ws use 

the available data abcut OaP to pic>: the point or ^ 

points of fusion betveen ono and i r"^ to ^naxinize the |. 

likelihood that IFBD vill be displayed. A Lnernat ively , ^; 

ve truncate C2I2 several sites or :n a tr.anr.er that : 

produces cs-j frarp-.-nnts of variable length and J3e the ^ 

C7,o^ fracr.ents to icbd ; cells exprcsi^Lng the fusion are r 

screened or selected which display :?2D3 on the cell [* 

surface.. An additional alternative is to incluue 3hort t 

segnents of randcr. D::a in the fusion of cro fraqr.en'os \ 

to iobd and then screen or select the resulting . ;. 

t; 

variegated por^uiaticn fcr r.oT.bcrs exhibiting the * 

display-of-IPBD p::erotypc. i* 



The pronoter for the c so- ir.bd gene, preferably, is ^ 
subject to regulation by a sr.all chemical inducer, such 

as iscpropyl th icga 1 actos ide (IPTC) (lac pronoter) . It ^ 

need not cor.o fron a natural oso gene; any regu la table [ 

bacterial promoter can be used. | 

r 

Cnce a genetic packaging systen employing p 
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. . ,,us has been designed, it 
vegetative ,| 
time CO Choose an If-O ( 




..,.e.oc.ocsingar.a.uraIOSPanda. 
,s an aUern...-- t instruct a .eae 

insertion site t.. OS ^^^^^^ ^. 

comprising: a, a rcgulatabl P ^.^^^^^.^ ,„,spcrt 

, , Shine-oalgarno ".ucn « ^ ^^^^ ^^^^ ^ 

.,,na. sec^ence, d, a us. ^^^^^^ ^ 
segment o£ random 0..A (as terminator^ 
a stop codon, and -) '-S 
,3 previously ^^^l^^^ emoodicd in .noun OSPs^ 

3 encode an OSTS, U>.e tn ^^.^ „ 

The .usion o. , siight-.y preCerred^ 

either order. ^ generated in thi= -/ 

X.olates erom the ^o,^^^^o. ^^^^ ,,,,er..Iy. ^ 

,e screened for = ..^tng is used tc select 

,0 version oe selection-throu h ^^^^^^ 

CPS that display I^BO ,^^,,,.ed 
.iternativexy --^^rotype. 
•for the display-o: i?- F 

of the random CSA 

- ^ne preference for iiid "P^-^^^^^^^^^^ ,f,ich the 

" arise, from - ccnsiderat-.on o ;^^_ ^^^^ , ..-e .iU 

,.ccessful CPdt'SO) --l'^^ ^ ,eg.on of tr.e 

,,troduce numerous J^:^^,, include gratuitous 

gene, sor.e of ,,„dom DMA. tnen 

^ . If Ei2il. precedes tn oS?-?30 

3 0 stop cocons. to no 

fous ston cortons in ESS Collows 
gratuitous ^t -t,^ coll surface. IfE- 

protein appearing on ^'-^ .^^^ .^op codecs m 

L random O.A. ''^^ ; proteins appearing on 
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Gpecit'Lcal ly sticky so chac Crs cisplaying inccmplece 
PDDs are easily ro-'novcd firom the population. 

The rondoro DUA can be generated frora any DNA 
having high cecuonce diversity by partially digesting 
with an enzyr.e that cuts very often. Sau^A I, for 
example, generates cohcs ivc-ctidod ONA that can bo 
cloned into a Sar-5t I or Hrrl 11 site. Alternatively, 
one could shear DNA having hiqn sequence diversity, 
blunt the sheared DtiA with the large tragnent of 
CO I i Of'A polyncrase X (hereinafter referred to as 
Klencw fragment) , and clone the sheared and blunted DMA 
into blunt sites of the vector (y^\:a82. p295, AUSL'37: 
5.1.1). 

Sec, l.r: Disp'iavi nq IPv.D on b^.-j r oria r>;jorcs: 

Bacterial spores have desirable properties as CP 
candidates. Hac : 1 1 us spores neither actively 

aetabolize nor alter the proteins on their surface. 
Hcwevor, spores are much r.ore resistant than vegetative 
bacterial cells or phage to cher:\ical and physical 
agents. Spores have the disadvantage that the 
molecular mechanisns tnat trig'jor sporulation are less 
,well worked out than is the for-ation of Hi: or the 
export of protein to the outer r.enbrano of ?J2-Li.* 

Sec . 1.2.1.: Pr e forre d Bacterial Sporo.*: for Use a s CPs:. 

Bacteria of the genus D a^- i 1 I us fom endospores 
that are extremely, resistant to damage by heat, 
radiation, desiccation, and toxic chemicals (reviewed 
by Lcsick et (LCSIG6)). These spores have co-plex 

structure and norphogones i s that is spec ifts-spec i f ic 
and only partially elucidated. The following 
observations are relevant to the use of 3ac 1 1 lus spores 
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az gcnecvc packa-r^es for the purposes of the presenc 
invent ion . 

Plasmid Dt.'A is comnonly included in spores. 
Plasmid encoded proteins have been observed on the 
surface of D^ ci 1 I-js spores (DEDROC). Sporulaticn 
involves cor;plex temporal regulation that is now 
Doderacely wc:ll understood (LOSI36) . C-oecial siq.-na 
factors, such as sigma^, are produced during 
sporulatLon, RJ:a polymerase bound to a sporulation 
signa factor recognizes promoters that are not 
recognised by RUA polymerase bound to a vegetative 
signa factor. The secuencf-s of several sporulation 
promoters are Jcnovn; coding sequencer operative.Ly 
linked to such promoters are expressed only during 
sporulation. Ray et a 1 . (RAVCST) have shown that the 
G4 promoter of s u ta t i 11 s is directly controlled by 

Rllk polymerase bound to sig-.a^. 

Donovan et a I . have identified several polypeptide 
cc.r.ponents of £L^ subt 11 is spore coat (DON087); the 
sequences of f-o co.r.pictc coat proteins and aminc- 
teminal fragments of two others have been determined. 
Some coDpo.ients of the spore are synthesized in the 
forespore, e.g. small acid-soluble spore proteins 
(ERRI38) , while other components are synthesized in the 
mother ceil and appear in the spore f e . g . the coat 
proteins). This spatial organization of syntheiiis is 
controlled at the transcriptional level. 

Spores scl f-acsembie , but the signals that cause 
various proteins to localize in different parts of the 
spore are not well understood; presumably, tho signals 
controlling deposition of the coat proteins Crcn the 
cytoplasm of the mother cell onto the spore coat are 



embedded in the po L yfjOpc : rto sequence. Some, but not 
all, of the coiJt protcir.'J arc syntne::i2ed as precursors 
and are then proccsrrid by specific proteases before 
deposition in the spore coat (COffOa?). . Viable spores 
5 that differ only sliqr.tly Trom wild -type arc produced 
in subt i 1 • z even if any one of four coat proteins is 
r. LGsir.rg {OOUOni). n:-ut:ic:e bonds- torn wichin the 
s::orc i thiol rcducira a-jcnt^ are r.oeded to r.olubiiize 
several of the proteins of the co*»t) . The I2kd coat 

10 protein, CoCD, ccntair.s 5 cysteines. CctD also 
contains an unusually hiqh nu-ber of histidines (16) 
and prolines (7). The ilV.d coat protein, CotC, 
cor: ta ins only one cysteine a nd one Qeth ion ino . CotC 
has a very unusual anino-acid sequence with 19 lysines 

15 (K) appearing as 9 :-*-!< dLpC:ptides and one isolated K. 
Th*;re are also 20 tyrcsir.es (V) of which 10 appear as b 
Y-V dipcptidos. Peptides rich in l uri^ K ar-^ krown to 
bccr.ce cross linked in oxiditirq environments (DEV073, 
•"AIT3:, v;ArT35, WAITnC) . CotC cor.tairis 15 0 and Z 

2 0 anino acids that nearly ccuals the iv There jr.j r.o 

A, f', P., 1, L, U , P, S, or w ar. ino at* ids i.-^. CctC. 

r.'cit^er CotC.r.or CotD is p t:: t- trans 1 a t i ona 1 ly cleaved. 
The proteins CotA and Cot3 are post-tran::;! at ionai ly 
cleaved. 

25 

- Endospores from Lt.f: fjc-nus 3ac i P. u r. are rr.ore stable 
than are exosporcs t rcr. ? tr^ntcnvces . •\A ci 1 l'j=i 
st:h t i 1 i 5 forns spores in <; to 6 hours, but G tr7 p to ~vces 
species nay require days or woek*j to sporuiato. In 

.10 addition, gon^tic >:nov/iod-je and can ipuiation is nuch 
ncre developed for r'T r.-. i I i s than for cthnr spore- 

for-ing bacteria. Thus r .-!C i I lug spores are preierred 
over S^ronrorvc-?s spores. Bacteria of the qenus 
Cl .->> r:t r id iun also forn very durable erdospcr'^s, but 

35 Clostridia, beinq strict an-i--3rcbos, arc not convenient 



■ sy 

to culture. The choice oC a species of B.tc i L lus is 
governed by Xncvledge and availability of clcr.intj 
systems and bv* how easily sporulation can be 
controlled. A particular strain is chosen by the 
criteria listed in Sec, 1.0. Spores ar** exposed to an 
oxidative cnviron.-ent after release front the r.other 
cell, so that di.-.ul f icJcs, if any, within" the IPbD night 
form. Many vfj^;ctutive biocher?. ic.i 1 p.jchwdys are shuc 
down when oporulation beriins so that prosthetic groups 
might not be avcjilablc. 

Sec . 1.2.2 P referred outer-surf -ice proteins for 

DisDlavina TPBD on aacterial Sooros: 

If a spore is chosen as CP, the proriotor is the 
niost . important part of the orp gene, because the 
promoter cf. a spore coat protein is most active: a) 
when spore coat protein is being synthesized and 
deposited onto th.-i spcro and b) in the specific place 
that spore coat proteins ar? being r-.ade. In 3... 
s ubt ills , 'jor.e oc the spcre coac proteins ore pcst- 
translat ional ly processed by spociric proteases. It is 
valuable to know ehe sequences of precursors and nature 
coat proteins so that we can avoid inccrpora t ing the 
"recognition sequence cf ^he specific protease into our 
construction of an CSP-iriiD fusion. The sequ'^nce of a 
mature spore coat protein contains information that 
causes the protein to bo deposited in the spore coat; 
thus gone fusions that include some or all of -3 r.ature 
coat protein sequence are preferred for screening or 
selection for the d isplay-cf-IPBD phonotype. 

Fusions of i nbd fragments to cotC or cotD 
fragments are likely to cause IPDD Co appear on the 
spore surface. The genes c ctC and cotO arc preferred 
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osp genes because CoCC ^and CotD are not post- l'..- 

translacionai Ly cleaved. Subsequences from cotA or ^ 

cot D could also be used to cause an I?30 to appear on p 

the surface of B^^ aubt i I i s spores, but --e must take -he b.-' 

post-translaticnal cleavage o these proteins into t'. ; 
account. ONA encoding IPilO ecu Id oe fused to a 
fragment of co~ A or co t H at citMer cr.d of the ccaing 

region or at ^ites interior to tr.e cod i no rogicn. I 

Spores could then be Gcreoned or selected for tr.e ^V. 
display-of-IPRD phenotype. 

To date, no "iaci ! 1 us sporuljtion pronoter has been [V;' 

shown to be inducible by an e>-o;jencus chemical inducer p..- 

as the lac promoter of c cl \ . .'ievemheless, the J;.. ', 

quantity of protein produced fron a sporulaticn >,.- 

promoter can be controlled by other factors, such as f" ' 
the DUA sequence around the sn i ne-Da Igarno sequer;ce cr . 
codon usage. Chemcaliy inducible sco ru 1 a t ;c.". 

promoters can be developed if r.ecezsncy. i..'. 



Sr!c. 1.2.3: ' C^^oice of Tr.::qrt ion s 1 1 o for I ? BO ; r. C-S ? 

of Dactorial Snore: i'. 

The considerations governing i.nsertion site in tne ^* 
3pore OSP are the sa.T.e as those given in Section 1.1.3. f* 

1' 

Sec . 1.2.4: : n V i\'o S I e c c i n f or P s t^udo-o so ^ |- 

Frorn Randon r;riA Innerts in ^r\czrr i 1 S.-ores: Jg 

S 

Although the cons idera t i-'^ns for r>pores are n^icrly \: 

identical to the considerations for vegetative ^■ 

bacterial cells (Sec. 1.1), the availatle inforridticn \y 

on the nechani::r.s that co'ise proteins to appear cn C; 

r 

spores is mea^^er so that use of the random-DNA approach j. 
becomes a r.ore attractive option. \ 

i. 
I 
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We can use the apDroac^ described above at 1.1.4 
for attaching an IPDD to an ^ cols cell, except that: 
a) a sporulatior pror.otcr is used, and .b) no 
5 perinlasriic signal iicqiioncc cr.ould be nrcsenc. 



Soc. 1.3.1: Preferr ^H Phorrr^c Tor U?e >ts CFr.: 

Unlike bacterldl colls and spores, cnoice of a 
phage depends strongly on V.ncvledge of the 30 structure 
of an CSP and hcv it interacts --itr. other proteins in 
Che capsid. The si^e of the phage genor.e and the 
pac>;aginq mechanisnr are also inport^nt because the 
phage gcnorie itself is the cloning vector. The oso:: 
i nhd gene must be inserted inco the phage genorr.e; 
ti-ercrorc : 



'A 



N ■ 



20 1) the virion must bn c:.?::ble of accepting the 

in-jortion or substitucicn ot gcncti: roterial. and 

2) the genonie of the phage r-usr. be srr.=^ll enough to 
allov convcni*2nt nan i pu la t i -rn . 

25 

Additional cons idorat ions in chocf. ing phage are: 

1) the rnorphogenet ic pachwu-/ of the phage 
dctc-rmines the cnv i rr.nr.en- in vhich the IPDD -ill 

3 0 have opportunity to fold, 

2) IPUOs containing ocscntial disulfide:; nay not 
fold with in a cnl I , 



n 
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3) IPDDs needinq Large or insoluble prosthetic 
groups nay not fold if secreted becauso the 
prosthetic group Is Ijcking, and 

5 4) when variegation is introduced in Part III, 

r.ultiple infections cculd generate hybrid G?s thjc 
cjrry the cene Ccr tr.e P:iD but have at least ■so:t.c 
copies of a different ?oO on their surfaces; it is 
preferable to r:inir.i;e this possibility. 

10 

Bacteriophages are excellent candidates for CPs 
because there is liitle or no enzymatic activity 
associated with intJct -.iture phage, and because the 
genes are inactive outside a bacterial host-, rendering 
15 the nature phage particles r.etabcl ica 1 ly inert. The 
fila-entouG phage >'.I3 and bacteriophage FhiXl'.M are of 
.particular interest. ■ 

FiI?.-°ntous -Jh^--^-^ : 

20 

The entire life cycle of the filamentccs phage 
Ml 3, a coni-on cloning an:: sequencing vector, is well 
understood. £-113 and fl :.rc zo closely related that uc 
consider the properties of each rclovar.t to both 

25 {PJ^SC36); any differentiation is fcr historical 
accuracy. The genetic structure (the conplete sequence 
(SCHAT5), the identity ann function of the ten genes, 
and tnc ordor of t ra n-jc r : pt ion and location of the 
prcr.oters) of. :\\2 is kno-vn as is the physical 

jO structure of the virion (llAN'NSl, D0LK30, CMA:i79, 
ITCK79, KAPL7o. vrjHN'oDC, KUI1N*87, .'•*^KOB0, MARVTS. 
KESSIB, C:iKA31, FA-SCoo, P.L.'S."ol, SCI!A7a, SMIT35. '^Z3S'Z, 
and ZZl'VA^l) : see PASC36 for a recent -roviow cf the 
structure and function of the coat proteins. 
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rilanentcus chace cntar col 1 through the sex 

piLus cells bearing the F-factor. Achtrian et aU 
{ACHT7S) observed that the pilus is extraordinarily 



by SDS. 

;v;;^ The 50 ar.ino -icid nature coat protein is 

t\r:; synthesized as a 73 anino acid precoat (ITOK79). The 

10 first 23 amino acids constitute a typical signal- 
sequence which causes the Piscent polypeptide to be 
inserted into the inner cell ner.brane. 

An col i signal peptidase (S?-I) recognizes 

15 amino acids IS, 21, and 23, and, to a lesser extent, 
residue 22, and cuts betveen residues 23 and 24 the 
precoat (KUHNaSa, KUH:;S5b, OLIVS7). {See also sec. 
1 . i .'2 for general rir.owledge on secretion in coJJ,.} 
After removal ct the signal sequence, tiie nr.iro 
2C terninus of- the r.ature coat is located on tho 
periplasnic side cf the inner ner.brar.-^; the carbo>:y 
terminus is on the cytoplasr.ic side. About 3000 copies 
of the ^-nature 5.3 ar.ino acid coat protein .isucciate 
s^ide-by-side in t\\e inner -er.brone. 
25 . 

The gone VI, VI I, and IV, proteins are also present 
at the ends of the virion in about five copies e.ich. 
The sir.gie-stranctod circular phage DN'A associates with 
about five copies of the gene III protein and is then 
3 0 extruded thro-jgh the catch of ne-b ra re-assoc ir.-cd coat 
protein in such a vay that the DN'A is encased in a 
helical sheath of protein (WIBSIS). Th^ Di;a dees not 
base pair (that vould i-pose severe res t r i-Jtions on the 
virus qonome) ; rather the bases intercalate with each 
33 other independent of sequence. Because the .^13 genor.c 



sensitive to SOS; 0.03% SDS inhibits binding of MC2 to f;- 
H,,v^ 5 pilin in v itro . Infection nay therefore ;be inhibited t'"'^ 
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is extrvtded throuqin r.^e tr.ez\brane and coated by a larqe 
number oC identic.il protein molecules, it can be used 
as a cloning vector f'WATS37 p273, and MESS??). Thus we 
can insert extra genes into M13 and they vill be 
carried along in a scjble r.anner. 

Margin and collaborators (MARV7 3, tV^KOSO. BAN^J?1) 
have determined an aoproxinate 3D virion structure of 
fl by a combination of genetics, biochemistry, and X- 
ray diffraction from fibers of the virus. Figure 3 is 
drawn alter the -.odel of Danncr et a I . (BANN31) and 
shows only the C^^pt-^s of the protein. The apparent 
holes in the cylindrical sheath are actually filled by 
protsin side groups so that the ONA wichin is 
protected. The amino ter:::inus of each protein mono-or 
is to the outside cf the cylinder, while the carboxy 
terminus is at smaller radius, near the oriA. Although 
othor filamentous phciges f e.g. Pfl or Ikf" have 
dif f eronf hei ici I cynnetry, all have coats composed of 
many short ■ a lpl:a -ho 1 ica 1 monomers vi::h the amino 
terminus oZ each monomer on the virion surface. 

DactcriophaqG Ph;>:i7.; : 

The bacteriophage ?hiX17<t is a very ::m3ll 
icosahedral virus which has been thoroughly studied by 
genetics, biochemistry, and electron microscopy {See 
The S inolG-Str^jnded pr.'A Pr.r^oes (CEN*H7S)). To date, no 
proteins from PhiX174 have been studied by X-ray 
diffraction. rniX174 is not used as a clonir.g vector 
because PhiX174 can accept almost no additional DNA; 
the virus is so tightly constrained that several of its 
genes overlap. Char.be re ct al . (Ch'A_y5 2) sho-'od that 
mutants in gene C are rescued by the wild-type G gene 
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carried on a plasnid so Chat the host supplies t:Us 
protein . 

Three geae products of PhiX174 are present on the 
outside of the nature virion: F (capsid) , G (riajor 
spike protein, 60 copies per virion), and H "(minor 
spike protein, 12 copies per virion). The G proccin 
corr.prisea 175 aaino acids, -hile H cor.-.prises 328 anino 
acids. The F protein interacts with the single- 
stranded DtJA of the virus. The proteins F, G, and H 
are translated fror. a sinqle mKN'A in the viral infected, 
cells. 
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Larq^^ DNA Phages 

Phoqe such as lantda or have much larger 

gencnes than do M13 or PhiXl74. Large qenones are less 
conveniently inanipulatod than snail genomes. A phage 
witl. a large geno.-ne, however, could be used if genetic 
r.anipulation is sufficiently convenient. Phage such as 
Lambda and T4 have r.ora ccr.p L ica ted 30 capsid 
structures than MIj or PhiX174, with more OSPs to 
choose from. Phage Innbda virions and phage T-i virions 
•form intraceliularly , so that tPDDs requiring large or 
insoluble prosthetic groups night fold on the surfaces 
of these phage. 
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PriA Ph ages 

KNA phage, such as Qbeta, are not preferred 
because manipulation of R::a is r.uch less convenient 
than is the manipulation of fJNA. Although ccr.potent 
PJJA bacteriophage are not preferred, useful genetically 
altered RNA-contain ing particles could be derived fron 
RfiA phage, such r\s MS2. 
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HS2 is a typical sr.all W<*A pha'ge zhaU carries only 
three genes that are tightly regulated through RNA 
structure and protein-RliA interactions. The RJ'A fills 
the protein capsid so that no additional genes can be 
acccTjnodated. To use MS2 as a CP, ve would need to 
elininace nost of the natural viral genonie so' that jn 
osD-inlid gene could fit into Cnc protein c^psid. It is- 
known that the A protein binds sequnnce-speci f ically to 
a site at the 5' end of the + RN'A strand triggering 
forrr.aticn of RJlA-conta in ing- particles if coat protein 
is present. If a message containing the A protein 
binding site and the gcr.e for a chir.era .of coat prot:iin 
and a P3D were produced in a cell that also contained A 
protein and wild-type coat protein (both produced froa 
regulated genes on a plas-id) , then- the RirA ceding for 
the chir.cri'c protein would get pockaged. The viral RNA 
replicase gene is not needed because all .corrponents 
needed for forrr.aticn of particles arc encoded in Oi'-A. A 
package cor.prising R;ia encapsulated by proteins encoded 
by that RN'A satisfies the -ajcr criterion that the 
genetic message inside the package specifies sor.ethinq 
on the outside- The particles by ther.selves are net 
Viable. After isolating the packages that carry in 
SBD, we would need to: 

1, separate the R:'A fror*. the protein capsid, 

2) reverse transcribe the RNA into DriA, using AMV 
or t<l\TV reverse transcriptase, and 

3) use Thor T.Ms n nu^t :cv; 5 DI.'A pol/rerase for 2 5 or 
more cycles of Polv~era3e Ch-:\ in Reaccion(-"J to 
amplify the OUA until there is enough to subclone 



67 

the recovered cenccic r.cssage i.nto a plasr.id for 
sequencing and further work. 

Alternatively, helper phac;e could be used to rescue the 
iscicitcd phage. In one of these '-ays we can recover a 
sequence that codes for an 5B0 having desirable binding 
properties. The j_n v 1 1 ro amplification (SAIK85, 
SCHA36, US Patents <,(i3 :.202 and 4,683,105) may be 
ccnvoniencly carried ouc using a P«rk in-E Imer/Cetus 
Thermal Cycler (part nur.ber ri301-0150) and CeneAnp DNA 
Arp L i f ication Reagent Kit {No01-00';3) supplied by 
Per>:in-Elrr.er Corp., 761 Main Avenue, Norvalk, CT, 
06859-0012, USA. The prir-^rs used in the roly:aer?.so 
Chain Keact ion f should bo picked so that the oso-v bd 
gene is the part of the reve rse-t rans f orned Dti\ that is 
anpl i f icd . 

Although such a procedure is r.uch more cumbersome 
thci.n use of D.'.'A phage, it -ay ce of interest if:.l) the 
genetic cackag^i of tr.e R::a phage i3 r.uch more stable 
than any Oil A phage, 2} the 3D structure of an .^i.'A phage 
is known (f2 forns cryc;t::ls inside col i . suggesting 
that structure de terr. i ra t ion of f2 virion rr.ay be 
practical), or 3) folding of a larga protein inside n 
cell is desired (chir schcr.e alio-s alnoi^r. the entire 
3.5 Kb geno-.e of K32 to be used fcr chimeric coat 
prccein-PDD) . Use cf rur.ions involving M52 coat 
protein, together with -Jild-typG coat protein, to 

encapsulate genes dcr.onst rates the most primitive 
systen that could be enployed in tt;e presr^nt invention. 
Although the systen has cert. i in technical 
inconveniences and thcrerjre is not preferred, it could 
be used. 
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Sec. 1.3.2: Preferrq^J C*: to r-.S-j r f ^. cg Prozcins for 

Disolavinj IP30s on Phaans: 

For a given boctcr icphage , the preferred OSP is 
5 usually one that is present on the phaqe surface in the 
Largest number of copies, as this allows the greatest 
flexibility in v.tryinq ;:he ratio of OSP-IP50 to wild 
type OSP and also gives the hi'jhcst livelihood of 
obtc^ining satisfactory affinity separation. Moreover, 
10 a protein present in only one or a few copies usually 
pertoritis an essential function in morphogenesis or 
infection; mutating such a proCein by addition or 
insertion is likely to result in reduction in viability 
of the CP. 

15 

It is preferred that the wild-type osp gene be 
preserved. The iobd ger.e fragment may be insertec 
either into a second copy of the recipient osp gene or 
into a novel engineered c 5;o gene. It is preferred chat 

20 the ceo- iobd gene be placed unc3r control _of a 
regulated promoter. Cur process forces the -?voluticn 
of the PBDs derived fron IPDO so that some of them 
develop a novel function, ••' i z . binding to a chosen 
target. Placing the gone that is subject to evolution 

25 on a duplicate gene is cin imitation of the widely- 
accepted scenario for the evolution of protein 
.families. It is now cc.'-.erally accepted tl'.at gene 
duplication is the first step in the evolution of one 
protein from an ancci;tral protein. 3y having two 

JO copies of a gene, the affected physiological process 
can tolerate mutations in one of the genes. This 
process is well undcrstccd and documented for Che 
globin family (cf^ DlCKZZ, p65ff, and CRE:I34, pll7- 
125), 
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The preferred OSP for use when tr.e G? is M13 is 
the gene III protein {see Zxarr.ple 1), 

Soc. 1.3.3: Choice of Insfirtlon site for TPBD in Or>P: 
5 . 

The user r.ust choose a site in the candidate OSP 
gene for inserting a iobd cene fragr.ent. The ooats ot" 
most bictcr icphagc arc highly ordered. Filanontot^s 
phage can bo described by a helical lattice; isor.etric 

10 phage, by an icosahcdral lattice. Each monorr.er of each 
major coat protein sits on a lattice point and makes 
defined interactions with each of its neighbors. 
Proteins that fit into the lattice by making some, but 
not all, of the normal lattice contacts are likely to 

15 destabilize the virion by: a) aborting formation of the 
virion, b) making the virion unstable, or cj leaving 
gaps in the virion so that the nucleic acid is not 
protected. Thus in bacteriophage, unlike the cases of 
bacteria and spores, it is important' to retain most or 

20 all of the residues of the parental OS? in engineered 
OSP-IPDD fusion proteins. 

Association of proteins into dimers, trir.crs, or 
even larger structures represents yet anothf:r aspect of 

25 protein binding. Tor proteins that fern such 

associations, heterologous r.ii>:tures of mutant and 
normal proteins will for.i if the mutations have net 
altered the interface bctveen subunits. For example. 
Ward al_^ have shown that tyrosyl tKNA synthetase 

30 will form heterodiners when mutant and normal protein 
are ailowftd to refold tcg^tnor {''^'ARDBG) . Sec also 
Hicknan and Levy (HICX33) who studied the raultimcric 
structures of the Tet^ protein by engineering cells to 
carry two different ret alleles and observing a TeU^ 

35 phenotype arising from the complementary alleles. They 
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conclude thit the Tet:^ protein Iz r.ult mcr ic . 
LTinunoglobul i n forr.ation depends on the ability of 



ins and do.-nains, each 



part of 



a separately 



synthesized protein, to associate independently of cho 
protein sequence in the antigen complemcntar i ty- 
detcrmining regions. In oddition, the process of 
immune cor.p L c.ienta t i on depends on the separability of 
the bindin'j properties of the comp i emon ta r i ty- 
dcternining regions fro.-a the binding properties of the 
constant doraains. 



Auditore-Hargreaves, UG Patents 4,470,925 
(AUOrS<a) and 4.479,395 (AnDI34b) teaches methods of 
n'=*king hybrid antibodies t*iac depend on arisociation of 
15 different antibody chains. These patents teach that 
alterations far fi'on the intertnolecular interface do 
not alter the association. 

A preferred site for insertion of the i pbd gene 
20 into th^ phO'Te cso gene is one in vhich: a) the IPOD 
folds into its original shape, b) the OSP tlor.ains told 
into their original shapes, and c) there is no 
interference Lenveen the" tvo dor.ains. It is not 
required that the IPcO and OSP domains have any 
2 5 particular spatial relationship; hence the process or 
this invention docs not require use of the method of US 
Patent '602. 



If there is a 30 nodel of the phage that indicates 
.30 that either the anino or carboxy tcrriinus of an OSP is 
exposed to solvent, then the exposed terminus of that 
nature OGP bocor.cs the prir.e candidate for insertion of 
the irbd gene. A low resolution 3D rxdel suffices. 
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In Che dtsence of a 3D structure, cho Anino and 
carboxy termini of the nature CS? are the best 
candidates, for insertion of the i pbd gene. A 
functional fu.-ion may require additional residues 
5 between the IP3D and 03P donains to avoid unwanted 
interactions between the domains. Randoni-scquence DMA 
or DN'A coding for a specific sor^^jcnce of a protein 
homologous to the IPBD or CZP, can ce inserted between 
the osp fragnent and the i rbd fracjr.ent if needed. 

10 

Fusion at a domain boundary within the OSP is also 
a good approach for obtainit g a functional fusion. 
Sraith exploited such a boundary when subcloning 
heterologous D.'.'A into gene riX of fl (SMIT35). 

15 

There are several ncthods of identifying domains. 
Methods that rely on a.cnic coordinates have bf^cr; 
reviewed by Janin and Chothia (JAf;i35) . These methods 
use matrices of distances between alpha carbons 

2 0 (^alpha) ' dividing planes (c.f . rOG£S5) , or buried 

surface (P^SiiS-;) . Chechia and co 1 labora cors have 
correlated the behavior of r.any natural proteins wich 
donain structure (according to their definition). 
Rashin correctly predicted the r.tacility of a domain 
25 ^comprising residues 206-:J16 of therr.ci ysin (VITAS'!. 
RASHS4 ) . 

Many researchers have used p/irtial proteolysis and 
protein sequence analysis to isolate and identify 
30 stable domains. (Sec, for oxar.ple, VITAS-:. POTE33, 
SCOT37, and PAB079.) Pabo et a_l_-. L:sed calocinetry as 
an indicator that the cl repressor from the coliphago 
lambda contains two dcr.ains; thoy then used partial 
proteolysis to determine the Icc-ition of the domain 

3 5 boundary. 



It i:; generally believed t.'iat the p^rt of the 
polypeptide c^.ain composing one do-.ain folds alr.oct 
independently of the parts conposing other oonainG. 
5 There are natural proteins cor.posed- of two or more 
domains for which there is strong evidence that 
essentially Che sa:r.e domain occurs r.ore th-in once, tor 
example ovor.'jcoids and ovo inh ib i t o rs {ZC0T^7) and 
kailikrein (CiiL't^SG) .■ further, the same dcnain c^n 
10 occur in several different proteins (SUDM05, CILBaS, 
and SCOTS*/ ) . 

If the only structural inforr.ati-jn available ic 
the amino acid sequence of the candidate 05P, we can 
15 use the sequence to predict turns and loops. There is 
a high probability that so.-re of the loops rind turns 
will be correctly predicted (cf. Chou and Fasman, 
(CHOU72)); these locations are also- candid>ites for 
insertion of the i obd gene fragment. . 

20 

Sec. 1 . 3 . : .In Vivo Sfjlocticn for Pscuco-Of. P C'-ne f rc.i 
R andom D.'.'A .'nseris in "actor Snr.r cs: " 

Alternatively, a functional insertion cito may be 
2 5 determined by go tie rating a nur.bor of roccr.binant 
constructions and selecting the functional strain by 
phenotypic characteristics. Because th^i CSP-IPBD tr.ust 
fulfill a structural role in the phage ca^t, it is 
unlikely that any particular ran-iom DN'A sequence 
30 coupled to the i pbd gone vill prod.jcc a fusion protein 
that fits into the coat in a functicral vny . 
Nevertheless, random UNA inserted becvecn largo 
fragments of a coat protein gene and the i pbd gene will 
produce a population that is likely to contain one or 
15 more members that display the IPED on the outside of a 
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viable phage. A display probe, sinilar to that defined 
in 1.1.^, is constructed and random DNA sequences 
cloned inco appropriate sites. 



Sec. 2: Choice of TPHD : 

A IP3D may be chosen from naturally occurring 
proteins or donains of natural W occurring proteins, or 
may be designed from first principles. A designed 
protein aay have advantages over natural proteins if: 
a) the designed protein is nore stable, b) the designed 
protein is smaller, and c) the ■:h:irge distribution of 
Che designed protein can be specified more freely. 

A candidate IPBD oust neet the following criteria: 



1) a dorrain exists chat will rer.ain stable under 
the conditions of its intended use (the domain may 
cor-prise Che entire procein Ch:it will be inserted, 
e.c. BPTX) , 

2) knoviedge of the amino acid sequence is 
obtainable , 

3) kno'-ledge of the identity of the residues on 
the dc^.^in's outer surface, and their spatial 
relationships, is obtainable, and 

A) a riolocule is available having specific and 
high affinity for the IPBO, AfM(IPOO). 



Preferably, the IPDO :s no larger than necessary 
because it is easier to arrange restriction sites in 
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snaUer anino-acid • sequences and because a smaller 
protein minir.izes the metabolic strain on the C? or the 
host of the CP. The usefulness of candidate I?BDs that 
-ect all ' of these requirements depends on the 
availability cf the information disc-jssed belov. 
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r:\fomaticn about candidate :?50s that will bo- 
used to judqe the .suitability of the :P30 includes: 1) 
a 3D structure (knowledge strongly preferred), 2) one 
or r.ore sequences honolpgous to the I PHD (the more 
hor.ologous sequences known, the better) , 3) tlie pi of 
the IPBD (knowledge necessary in sc=e cases), 4) the 
stability and solubility as a function of temperature, 
pH and ionic strength (preferably f;jiown to • be stable 
over a wide range and soluble in conditions cf intended 
use), 5) ability to bind r.etal icrs such as Ca""* or 
y.q** (knov ledge preferred; binding per se, no 
preference), .6) enzymatic activities, if any (icnc-'ledce 
preferred, activity eer se has u-es but cay cause 
prcblens), 7) binding properties, if any (knovledge 
preferred, specific binding alio preferred), £) 
a.vailability of .a molecule having srecific and strong 
affinity ( < 10"^^ «) for the Ir3D (preferred), 9) 

availability of a nolecuLe having specific and -ediuri 
affinity ( 10'^ M < < 10~= M) for the IPBD 

(preferred) , 10) the sequence of a zutant of I?3D that 
does not bind to the affinity r.olecule(s) (preferred), 
and 11) absorption spcctrun in visible, UV, ::y.R, etc . 
(characteristic absorption preferred). 

If only one species of rolecule h.-.ving affinity 
for IPDO (AfM(IP8D)) is available, it will be used to: 
a) detect the IPDD on the G? surface, b) octinize 
expression level and density of the affinity molecule 
on the r.atrix (Sec. 10.1), and c) detemine the 
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efficiency and sensitivicy of the affinity separation 
(Sees. 10.2 and 10.3). As noted above, hc-ever, one 
would prefer, to have available tvo sdecies of 
AfM(IP3D), one with high and one with noderate affinity 
5 for the IPBD. The species with high affinity would be 
used in initial detection and in deter::iin ing efficiency 
and sensitivity (10.2 and 10.3), and the species with 
moderate affinity would be used in optimization (10.1). 

10 There are nany candidate IPBDs, 20 or zore, for 

which all of the above infonnation is available or is 
reasonably practical to obtain, for example, bovine 
pancreatic trypsin inhibitor (5?TI, 53 residues), 
cranbin (46 residues), third docain of cvoiiucoid (56 

15' residues), T4 lysozyne (164 residues), and azur.in (123 
residues). Structural infornation can be obtained fron 
X-ray or neutron diffraction studies, NMH, cher.ical 
cross lin>;ing or labeling, aodaling frc= known 
structures of related proteins, or fron tr.aoretical 

20 calculations. 30 structural inforrtation obtair.ed by .X- 
ray diffraction, neutron diffraction or i.'MR is 
. preferred because these r.ethods allcw localization of 
almost all of the atc-.s to within defined lir.its. 

25 Most of the PBDs derived fro,-: a PFBD according nc 

the process of the present invention affect residues 
having side groups directed toward the solvent. 
Reidhaar-Olson and Sauer (P.EIOSS) found that exposed 
residues can accept a wide range of anino acids, while 

30 buried residues arc nore linited in this regard. 
Surface nutations typically have only srrall effects on 
r.elting tcr.perature of the PBD, but nay reduce the 
stability of the PBD. Hence the chosen 1?3D should 
have a high .T.clting temperature (ec^C acceptable, the 

35 higher the better) and be stable over a wide pH range 
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(8.0 to 3.0 acceptable; 11.0 to 2.0 preferred), so that 
the SDOs derived fron the chosen IPBD by nutation and 
selection- chrouqh-binding •-'ill retain suLficiont 
stability. Preferably, the substitutions in the IPDO 
yielding the various PDDs do net reduce, the melting 
point o: the domain below 50*^C. Mutations nay arise 
that increase the stability of SbOs relative to the 
IPBD, but the process of the present invention does not 
depend upon this occurring. 

Tvo general characteristics of the target 
nolecule, size and charge, Rake certain classes cf 
IPBOs niore likely than other classes to yield 
derivatives that will bind specifically to the target. 
Because these are very general characteristics, one can 
divide all targets into six classes: a) large positive, 
b) large nei:tral, c) large negative, d) snail positive, 
e) sr.ali neutral, and f) snail negative. A snail 
collection of IPBDs, one cr a few correspond i.".g to each 
class of target, will contain a preferred candidate 
IPBD for any chosen target. 

Alternatively, the user nay elect to engineer a 
G?(IPBD) for a particular target; Sec 2.1 gives 
criteria that relate target size and charge to the 
choice of IPSO. 

Sec. Tnfliienco of target si:g on choice of IPDD: 

If the target is a protein or other r.acrorcolecule 
a preferred cnbodiment of the I?2D is a snail protein 
such as BPTI from B os Taurus (3S residues), cranbin 
fron rape seed (mG residues) , or the third domain of 
ovomucoid from Coturnix coturn i x Jauon ica (Japanese 
quail) (56 residues) (PAPA82), because targets fron 
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this class have clefts and grooves that can accommodate 
sniall proteins in highly specific vays. If the target 
is a macronolecule lacy:ing a conpact structure, such as 
starch, it should be treated as if* it were a s^-nall 
-.olecule. Extended r.acronol ecu les wich defined 3D 
structure, such as collagen, should be treated as large 
nolecules. 

If the target is a snail colecule, such as a 
steroid, a preferred enbodir.ent of the IPDO is a 
protein the size of ribonuclease fron Bos t::^uru5 (12'i 
residues), ribonuclease fro^t Asdq ra i 1 lus cruzae (104 
residues), hen egg vhite lysozyne from G.^I lus gal lus 
(129 residues), azurin fron Pseudo.-.onas aeru^^enosa (12S 
residues), or Ti lysozyne (164 residues), because such 
proteins have clefts and grooves into which the snail 
target tr.olecules can fit. The Rrco)thaven Protein Data 
Bank contains 3D structures for all of the proteins 
listed. Genes erccding proteins as large as T-; 
lysozyne can be nanipulatcd by star.card techniques for 
the purposes of this invention. 

If the target is a nineral, i .-.soluble In vater, 
one r.ust consider tno nature of the riolecular surface 
of the mineral. Minerals that hive srr.ooth surfaces, 
such as crystalline silicon, require nediun to large 
proteins, such as ribonuclease, as IPUD in order to 
have sufficient contact area and specificity. Minerals 
with rough, grooved surfaces, such as zeolites, could 
be bound either by s-iall proteins, such as 3PTI, or 
larger proteins, such as T-\ tysozy:;e. 

Sec. 2.1.2: Influence of tar'^et ch.^rn e on choico of 
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Electrostatic repulsion between jiolecules of like 
charge can prevent .-nolecules with highly co.T.plementary 
surfaces froa binding. Therefore, it is preferred 
that, under the conditions of intended use, the IPBD 
and the targec nolecule either have opposite charge or 
that one of then is neutral. In so=e cases it has been 
observed' that protein r.olccules bind in such a way that 
like charged groups are juxtaposed by including 
oppositely charged counter ions in the molecular 
interface. Thus, inclusion of counter ions can reduce 
or eliminate electrostatic repulsion ann the user may 
elect to include ions in the eluants used in the 
affinity separation step. Polyvalent ions are more 
effective at reducing repulsion than nonovalent ions. 

Sec . 2.1 .3: Onher considerations in the choice of 

If the chosen IPEO is an enzyne, it may be 
necessary to change one or more residues in the active 
site to inactivate enzyne function. For ejcample, if 
the IPBD were T4 lysozyrre. and the CP were coli ceils 
or M13, ve would need to inactivate the lysozyr,e 
.because other-vise it would lyse the colls. It", on the 
other ha.nd, the GP were Phi>:i7 4, then inacrivation of 
lysozyne ray not be needed because 74 lysozyr.e can be 
overproduced inside F^^ co I i eel 1 s - wi thou t detrimental 
effects and FhiX17< for-.s intracel lularly . It is 
preferred to inactivate onzyne I?30s that night be 
harnful to the CP or its host by substituting mutant 
ar.ino acids ac ^r.e or no re residues of the active site. 
It is permitted to vary one or =ore of the residues 
that were changed to abolish the original enzyr.atic 
activity of the IFBO. Those CPs that receive osp-cbd 



79 

qenes encodinq nn active onzyne nay die, but the 
majority of cequences will not be deleterious. 

If the binding protein is intended for therapeutic 
use in hunans or jnimals, the IPBD nay be chosen fron 
proteins native to the designated recipient to ainini-je 
the possibility of antigenic reactions. 

Sec. 3: ChoicG of OCV : 

The OCV is preferably snail, e.g., less than 10 
KB. - The size of the OCV affects the stability of the 
OCV and its derivatives, and the copy nur.ber thereof. 
An OCV which is stable, even after insertion of -it 
least 1 Itb D.'.'A, is sought. A multicopy CC/ is also of 
i.iterest. It is desirable that cassette nutagene^sis be 
practical in the OCV; preferably, at lease 25 
restriction ercyr.es are available that do net curt the 
OCV. It is likewise desirable that s mq le-stranded 
mutagenccis be practical. Finally, the OCV preferably 
carriers a selectable rar>;er. 

If a suitable OCV dees not already exist, it r.jy 
be engineered by r»an ipu 1 a t ion of available vectors. 

In tlie cases of bacterial colls and bacterial 
spores, the b.-\cterial chror.osor.e could be used as the 
OCV. Plasnids are, however, preferred because genes cn 
plasnids are nuch nore easily constructed and nutated 
than are genes in the bacterial chro.-?.osc.-.e . When 
bacteriophage arc to be used, the csp- i p t?d ger.c r.ust be 
inserted into the phage gcno.-ne. The synthetic c.qp - i pc:i 
genes can be constructed in sr.all ve;:tors and 
transferred to the CP genc-e when corr.plete. 
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Phage such as M13 do ' not confer antibiotic 
resistance on the host so that one can not select for 
cells infected with M13. An antibiotic resistance gene 
can be engineered into the M13 genome (KINEBO) . More 
virulent phage, such as PhiXl74, maJce discernoble 
plaques that can be picked, in which case a resistance 
gene is not essential; furthermore, there is no room in 
the rhi;<174 virion to add any new genetic oaterial. 
Inability to include an antibiotic resistance gene is a 
disadvantage because it limits the nunber of CPs that 
can be screened. 



It is preferreo that CPCIPBD) carry a selectable 
Darker not carried by wtGP. It is also preferred that 
15 wtC? carry a selectable marker not carried by C?(IPBO) . 

Sec. 4: Desicnim the oso-ipbd a&nc insert: 

Having chc-jen a IPBD, a CP, a strategy for getting 
20 the IPnD onto the G? surface, and a clonir.g vector, we 
now turn to the design of a suitably regulated gene. 
In this section, we design an anino acid seq\:cnce that 
will cause the I FED to appear on the CP surface, when it 
,is expressed. This anino acid sequence r.ay determine 
25 the entire coding region of the cso^nbd gene, or it 
-ay contain only the iohd sequence adjoining 
restriction sites into which random; DNA will be cloned 
(Sec. 6.2) . 

30 We will now consider the transcriptional 

regulation of the o -=o-Lnbd gcnc; the design of the DMA 
encoding of anino acid sequences; the organization of 
synthesis; the nethods of DNA synthesis and 
purification; and the actual gene synthesis and 

35 cloning. 
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The actual gcno may be: a) completely synthetic, 
b) a corriposite of tidtural and synthetic Oru. or c) a 
composite of natural DNA fragments. The important 
point is that the ^hri segment, derived fron the i nbd 
segiTcnt, be easily genetically nanipulated in the ways 
described in Part III. A synthetic rsfccl sogr.ent is 
preferred because it allows greatest control ovor 
placement of restriction sites. Priiiiers coripler.entary 
to regions abutting the oro-iob d gene on its 3' flank 
and to parts of the o?o- i rbd gene that are not to be 
varied are needed for sequencing. 
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Sec. 4.1 C-?r-eci c rerjuiation of the osp-ir^bd gene: 

;;ov we consider regulation of the osn-icbd gene to 
enable modulation of expression. The two important 
questions are: a) hcv nuch OSP-IPBD do we need on each 
CP, and b) how accurately nuct we regulate the a.r.ount? 

The esisential function of the affinity reparation 
is to separate CP:: that bear pnCs (derived fron IP3D) 
having high affinity for the tarcet from CPs bearing 
PSOs having lew affinity for the target. If the 
eluticn volu.T^e of a CP depends on the number of PBDs on 
the CP surface, then a CP bearing many PCDs with low- 
affinity, CP{PBD,j), might co-elute with a G? bearing 
fewer PliOz with high .affinity, CP(P3Ds) . Assu.ne that 
both CP{PDn.^.) and Gr(?BDs) bind to the column under 
some condition, isuch as low salt. If a gradient of 
some solure, such as increasing salt, changes the 
conditions, then all weakly-binding PBDs will cease to 
bind before any strongly-binding PBDs cease to bind. 
Regulation of the o r.p-pbd gene must bo such that ail 
packages display sufficient PDD to effect a good 




I 



separdCion in Sec 15. If the j-o^-.it of PRD/CP had on 
effecc on the eluticn volune of the CP from the 
affinity nacrix, then we would need to regulate the 
ucount of PHD/GP very accurately. The fdllovinq 
5 onalysis sho-s that there is no strong linear effect of 
rP3D/G? on ciution volur.e and assur.es only: a) that all 
CPs are the sa-c size, b) that interactions between the 
PhZs and the affinity rr.atrix dominate differential 
elution ot CPs, c) that the system is at equilibriun, 
10 and d) that all PBD3 on any one CP are identical. 

If tip identical PBDs on a C? each have access to 

target -olecules, and each PBO has a free-energy of 

binding to the target of delta C^, then the total free 
15 energy of binding is 

delta CvjCo^ = Cp * delta C^ . 

Delta will be a fur.ction of several parameters of 
20 the solvent, such as: 1) concentration of ions, 2) pH, 
3) te.Tipcrature , 4) concentration of ncucral soi-jtes 
such cs sucrose, glucose, cthanol, etc. . 5) specific 
ions, such as, calciun, acetate, benzoate, nicotinato, 
e tc . If conditions are altered during affinity 

25 -separation so that delta aoprcachc:; zero, doLta 

G^"-°^ apprcach^rs zero :.'p tines faster. As delta C^^^^ 
goes to or above zero, the packages ■ wi 1 1 dissociaf? 
froa the i.T.-cbil izcd target noiccuies and be eiuted. 

3 0 CPs hearing r.ore P3Ds have a sharper transition 

between bour:; and unbound than pacV;ages with fewer of 
the sane Pr.Ds. For equilibriun conditions, the .-nid- 
point of the transition is detcrnincd only by the 
solution conditions that bring the individual 



interactions to zero frec-cncrgy. The n*j.T.ber of 
PBDs/CP deternincs the sharpness of the transition. 
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It should also be noted that the nur.ber of ?BDs/C? 
5 is usually influenced by physioloqiGal conditions so 
that a sampl'j of genetically identical* CI^ri'0D}s nay 
contain CPs having different numbers of POOo on the CP 
surface.- In a population of C?(vgPOD)s each Pnn 
sequence vill appear on more that one CP, and the 

10 actual number of PBOs/CP will vary from CP to G? vithin 
sorae range. within a variegated population of PBDs, 
let PSDj; be the PBD with maximura affinity for the 
target. If there is a linear effect of nuciber of 
PBDs/GP, then the CPs having the greatest nunber of 

15 PBDj^ vill be most retarded on the coluran. when '^e 
culture the enriched population obtained either as an 
effluent fron the colunn or as an inoculua of marrix 
naterial frcn the colunn,, the C?{PBD;^) vill be 
amplified and give rise to new GP(?aD;^)G having varying 

20 numbers of PBO^/CP. Thus the affinity separation 
process of the present invention could tolerate a 
linear effect of nurr.ber of PBDs/C? on the elution 
volume of the G?(PDD) unless strong binding to target 
fortuitously causes the PDD to be displayed on the CP 

25 'only in low nunber. It is extrnrr.oly uniiV'.ely that iill 
PBDs that bind co the target will also be incapable of 
display in large amounts on the CP surface. 
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According to the above analysis, there is no 
30 linear effect on elution voLune fron the nur.ber of 
IPBDs/GP, hence need for highly accurate regulation of 
IPBD/CP is net anticipated. The analysis above assunes 
that CP{lPDD)s arc in equilibrium between solution in 
buffer and bound to the affinity matrix. Rate of 
35 elution nay be an important parameter in colu.-r.n 
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atfinity c M'onu ^cr; r-:jpny . In bacch olution froa an 
affinity matrix or uLucion froa an affinicy plate, the 
tine that each buffer is in contact with the affinity 
aaterial nay be an ir.porcant variable. The density oC 
5 affinity nolecules on the natrix is an important 
variable in op t i .■:> i z i ng the affinity separation. 
Because* the analysis above is qualitative, in Sec. 10 
ct* the preferred ctr.bcdincnt wc expe r incnta 1 1 y cptini-^o: 
1) the density IPOD on the CP surface. 2) the 

10 density- of affinity nolecules on the affinity patrix, 
3) the initial ionic strength, 4) the elution rate, and 
5) the quantity of CP/(volame of matrix) to be loaded 
on the colunn. 

i5 A nu.-nbor of promoters arc known that can be 

controlled by specific chemicals added to the culture 
aediun. For cxar.plc, the ^acUVO promoter is inducefl if 
isoprcpylthiogalactoside is added to t."e culture 
nediu.-n, for example, at between l.G uK and 10.0 m.M. 

20 Hereinafter, ve use ".XltiOUCE" as a Generic tern fcr 
chemical that induces expression of a gone. 

Transcriptional regulation of gene expression is 
best undcrs::ocd and nest effecr.ive, so wc focus our 
25 attention on the promoter. If transcription of the 
os p-ip bd ge.ne is controlled by the chemical X.T:;duCF:, 
then the number of OSP-IPBDs per CP increases for 
increasing concentrations of XINDUCE until a fall-off 
in the number of viable packages is observed or until 
30 sufficient tPDD is observed on the surface of harvested 
CPCIPBn)3. The attributes that affect the maximum 
number of crP-IPUOs per CP are primarily structural in 
nature. There may be steric hindrance or ether 
unwanted interactions between IPSDs if 03P-IH3D is 
substituted for every wild-type OSP. Txcessive levels 
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of OSP-XPBO r.ay also adversely affect the solubility or 
dorphogonesis of the GP. For cellular and viral CPs, 
as few as five copies of a protein having affinity for 
another i cr.ofc i 1 i zed r.olecule have resulted in 
successful affinity separations (FERE33a, FE:^£32b, and 
SMITS5) . 

Another consideration of pronoter regulation is 
that it is useful later to know the range of regulation 
of the osp- iotd . (Sec. B) In particular, one should 
deterniinc hov nearly the absence of XINDUCE leads to 
the absence of IPBO on the CP surface; a non-leaky 
procoter is preferred. N'on-leakiness is useful: a) to 
show that affinity of GP f o sn- i obd ).s for AfM(IPBD) is 
due to the osp- ipbd gene, and b) to allow growth of 
G?f o sD-obd l in the absence of XINDUCE if the expression 
c: c'^n-ph d is disadvantageous. The la cUVS prc-oter in 
conjunction- wich the LacI^ represser is a prrifcrred 
ex*arr.pie. 

-^ec. 4.2: r.:i\ Fcnuf-nce dor-ign: 

The present invention is not linitod to a single 
method of gene design. The following procedure is an 
ex-anple of one r.ethod of gene design that fills the 
needs of the present invention. 

Having specified that the ancunt of IPBD/CP is to 
be experir.entaliy cptinized and that well-studied 
available regulatory r.eohan i str-s applied to osp-inbd 
gene are sufficient, ve now consider design of a DtlA 
sequence. If the anino-acid sequence of 03P-IPBD is a 
definite sequence, then the entire gene will be 
constructed (Sec. 6.1). If randon DliA is to be fused 
to ipbd , then a "display probe" is constructed first; 



? 



i 




random D::\ is then inserted to ccr.plete the 
population, o: puc.itive oso- i nbd genes (Sec. 6.2) fron 
vMch a func- ic)r..^l opp - inbd gene is identified by in 
V jvo selection or kindred techniques. 

5 

The oso~ 1 qene need not be synthesized Zozo ; 
parts of the qene laay he obtained fi-cn nature. One 
nay use any ccnetic engineering method to produce the 
correct gene fusion, go long as one can eairily and 

10 accurately direct nutations to specific sites in the 
p bd ONA subsequence (Sec. 14.1). In all of the -ethcds 
of nutagcnesis considered in the present inver.tior., 
however, it is necessary that the D»A sequence for the 
o so- irbd gene be different from any other DNA in the 

15 CCV. The decree and nature of difference needed is 
determined by the nethcd of nutagenesis to he used in 
Sec. 14.1. I: the ccthod of mutagenesis is to be 
rcplacer.enc c: subsequences ccdir:q for the PSD --ir.h 
vcCN'A, then the subsoquences to be autaceni.:ed rust ce 

2 0 bounded by restriction sites that are unique vith 
respect to the rest of the OCV. If s i ngle-st randed- 
oj. igonucleotide-dirccted nutagonesis is to be used, 
then the DNA se:;uGnCG of the subsequence coding for the 
IrSD r.ust be ur.ique with respect to the rest of the 

2 5 CCV . 

The sequor.ccs of regulatory parts of the qene are 
taken fron the sequences of natural regulatory 
ele.T.ents: a) prc.-stcrs, b) Sh inc-Da iqarno sequences, 
30 and c) transcriptional terminators. Regulatory 
cler.ents could also be designed fron kncvlcdge of 
consensus sequences of natural regulatory regions. The 
seqi^cnces of thoje regulatory eleconts are connected to 
the coding regions; restriction sites are also inserted 
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in or adjacent to the regulatory regions to allow 
convenient nanipui j t ion. ... 

The coding portions of genes to be synthesized are 
designed at the protein level and then encoded in DNA. 
The amir^o acid sequences are chosen to achieve various 
goals, including: .^) diopU^xy of a Ir3D on the surface 
of a CP, b) change of charge on a IPBD. and c) 
generation of a population of PBCs froa which to select 
an SBD. ■ The ar.biguity in the genetic code is exploited 
to allow optical placcncnt of restriction sites and to 
create various distributions of anino acids at 
variegated codons. 



Specific QUA sor:t:ence assionp.ent: 



A computer 
aabirjuous Ol^A 



progr?.n r.ay be used to construct an 
sequence coding for an a-ino-acid 



sequence given by the u:ser. 



is» the D?;a sequence 



contains cedes Cct all pcr.sible C::a sequences that 
produce the stated amino acid sequence. The codes used 
in the ar:biguouc Dth\ are shown in Table 1. An exanpic 
of an anbinuous D;JA sequence is given in Table 3. 

The ui:er supplies lists of restriction enzyr.es 
that: a) do net cut the OCV , and b) cut the CCV only 
once or twice. t"cr Goch cnzyr.e the prograni re.=\dG: a) 
the na.-nc, b) the recognition sequence, c) the cutting 
pattern, and d) Che nar.cs of suppliers. The arilguous- 
DN'A sequonce cooing for the stated a-ino acid sequence 
is ovanincd for places that recognition sites for any 
of the given eniyr.es could be created without altering 
the onino-acid sequence, A master table of enzynes 
coula bo obtained from the catalogues of cnzyne 
suppliers such vTs the suppliers listed in Table 4 or 
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other sources, such as Roberts' annuol review oC 
restriction enzymes in Mucloic Acids Kosearch. 

Each potential rccoqtution site c;\uses a rpcord 
si-ilar to the following to be it ten 
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possible DN'A 


: A^\r 


AOy 


TTr [TGC 


cutter 


: A 


AGC 


TT 


resu 1 t 


:AAA 


AGC 


TTr 



5 'S'.-iNNNA ACCTTNNNIIN3 ' 

3 't.'riNN'NTTCCA AN^:NSN5' 



15 The top line identifies the encymc, H i nd ril in 

this cxarr.ple, and the supplier (throuqh codes given in 
Table 4i ; •'Loc=9" indicates that recognition begins 
with nucleotide 0; "t=9'* indicates thjt the ^ntisenr^ia 
(t.cp) strand of DfiA is cut after base 9; "a---l3" 

20 indicates th.=it the sense strand (or bottom scrand, not 
shovn except in the dsCNA on the right) is cut between 
bases 13 -and 14 (reading left to right). "Dir=n" 
indicates that recognition is "nornal". liiziA 1*1 
recognizes palindronic sequences, as do nest 

2 5 restriction ensyr-.cs. Sor.o en^ynes h.^ve asyrirr.o.t r ic 
-recognition, however, and cut to one -jido; for thece 
cnzyrces, the recognition could be "noir.al*' or 
"reversed" depending on whether the cnr.yme cuts to the 
right or left of the recognition i;ito. Hare 

30 ;:naDbiguous stretches that require certain restriction 
sites are labeled as "obligatory"; those that Are 
elective are so labeled. 



The second and third lines ^how the arrino-acid 
35 sequence and residue nur.bcrc for which this region of 
D1;a codes. The notation "Cut ? l/ti'* indicates ttiat 
this is the first of six por.r.iblo t!.!_nd LIE sites. 
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The fourth Line shows the antisense strand of DIIA 
coding for the desired amino-acid sequence. The fifth 
line shows the recogniticn pattern of the enzyjie. The 
sixth line shows the consensus between the DU\ sequence 
.. 5 required by the amino acid sequence and the DMA 
sequence recognized by the rc;3triction enzyr. e. The 
dsONA to the right shows the ends generated by the 
restriction digestion. 

10 The program also prints a table sumnarizing the 

possible sites. An exaniple of such a su::inary of 
potential sites is found in Table 5. 

The choice of elective restriction sites to be 
15 built into the gene is deteriained as follows. 



The goal is to have a • series of fairly uniforr.ly 
spaced unique restriction sites wich no r.ore than a 
preset r.a:<ir.ua nur.ber of boses, for exa-ple 100, 
between sit^s. Unless required by other sites, sites 
that are not present in the parental CCV are not 
introduced into the designed gene more than or.ce. 
Sites that occur only once or twice in the parental OCV 
are not intrcduced into the designed gene unless 
necessary . 
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First, each enzyr.e that has a unique possible site 
is picked; if two of these overlap, then the better 
en:yr?e is picked. An enzyne is better ir" it: a) 
generates cohesive ends, b) has unar. biguous 
recognition, or c) has higher specific activity. Next, 
those sites close to ether sites already picked are 
elininatcd because many sites very close together are 
not useful. Finally, s;ites are chosen to rnininize the 
size of the longest piece between restriction sites. 
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The azLbiguity of the OUA between the restriction j.-,/.': 
sites is resolved frca the following considerations. '7: 
If the given amino acid sequence occurs in the 
recipient crr^nisn, and if the ONA sequence of^the gene 
in the orga-isr. is >;ncv:i, then, preferably, ve riaxinize 
the diffore-ces betvoen the engineered and natural ,^ 
genes to ninir.ize the potential for rocnnib inat ion. In t;';. 
addition, the fclloving codonG are poorly translated in 
coii and, therefore, are avoided if possible: 
ctaCD, cga (R) , egg (R) , and agg (R). For other host \^ 
species, different codon restrictions vould be p 
appropriate. Finally, long repeats of any one base ore [. 
prone to c-^taticn and thus are avoided. Balancing . 
these cons ice r t i ons , ve can design a DUA sequence. 

!;^>c. S.I: r--.->^ gone synthesis: 

New ue ccnside- vays to divide the synthesis of - H 

Che desigr.ed cene irto r.anageable segr.encs. The 'j^ J 

presenc invention is not ti.-nitcd as to how a designed ] 

C::a sequence is divided for easy synthesis. The ^' -j 
following procedure is an exar.ple o: he- r.uch. synthe^ iii 
might be nonaged. 
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An estabJisned -ethod is to synthesize both 
strands of the entire gene in overlapping segr.ents of 
20 to 50 nucleotides (nr?) (TIIER33) . 2cio-- ve provide 
an alternative rethcd that is more suitable for 
synthesis of vgO::A. This net hod is sir.ilar to r^ethods 
pu-li^nod by Oliphant ot aU (OLIPSG and 0LIP87) and j;-" 
A-jsubel et a_U (ACSL'ST). Our adaptation of this niethcd 
differs fror. previous r.cthods in that wo: a) use two 
synthetic strands, and b) do not cut the extended Dl.-A 
in the middle. Our goals arc: a) to produce Icnqer 

t :1 
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pieces of dsDNA than can be synthesized as ssDtiA on 
commercial DNA synthesizers, and b) to produce strcinds 
complementary to single-stranded vgDNA. By using two 
synthetic strands, we remove the requirement for a 
5 palindromic sequence at the 3' end. 

DUA synthesizers can currently produce oliqo-ncs 
of lengths up to ICO nts in reasonable yield, ^otiA " 
100. The parameters (the length of overlap needed 

10 to obtain efficient annealing) and (the nuTiber of 

spacer bases needed so that a restriction enz^Tr.e can 
cut near the end of blunt-ended dsDNA) are determined 
by DNA and enzyme chemistry. ~ 10 and Ug = 5 are 

reasonable values. Larger values of li^. and arc 

15 allowed but add to the length of ssDNA that n:ust be 
synthesized and reduce the net length of dsD;*'A that can 
be produced. 

Let Al, be the actual length cf dsDNA to be 
20 synthesized, including any spacers. A^ must be no 
greater than (2 Hq^ja ~ ^^'-.j) - ^'^^ Ov be the nur±!er o: 
nts that the overlap window can deviate from center, 

25 ' Qu = (2 Kd:ia - - Al)/2 . 



iz never negative. It is preferred that t!ie two 
.10 fraqmontc; be apprcxina toly the same length so that: the 
amounts synthesized will bo approximately equal. This 
preference may be overridden by other considerations. 
The overall yield of dsD:.*A is -usually dominated .by the 
synthetic yield of the longer oligo-nt. 

35 

Wc use the fol loving procedure to generate dsDN'A 
of lengths up to (2 MDtj;\ - Ny) nts through the use of 
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Klencv frog-ent to extend oyntnctic i?c DKA frcgr-.cnts 
that are not raore thon Mq^/, nts long. When a p.^ir ot* 
long oLigo-nt3, conpleriento ry for K„ nts at thc.'.r 3' 
ends, arc annealed there will be a free 3' hydroxyl and 
a long iicLiU chain continuing in the 5' direction on 
either cidc. v;e will refer to this situation as a 5' 
superovcrh'jn':; . The procedure conpriscs: 

1) picking a non-pal ind ronic subsequence of N.^ tc 

nts near the center of the dsDMA to be 
synthesized; this region is called the overlap 
(typically, ll.j is 10), 

2) synthesizing a ss DtJA molecule that conpriscs 
thcit part of the anti-sonse STir-and from its 5' end 
up to and including the overlap, 

3) syntftes iz i ng a ss DUA molecule that ccnprises 
that part of the sense strand from its 5' end up 
to and includinc,' the overlap, 

4) annealing the two synthetic strands that are 
corr.pl one nt^iry throughout the overlap rngion, and 

extonriing both supc roverhangs wit.h Klencw 
fragment and all four dooxynucleotido 
triphosphates . 

Dccnuso Mqma rigidly fit iced at 100, the 

current limits of 100 (= 2 Mq^j;^ - U^.) nts overall and 
100 in each fragr.ent are not rigid, but can be exceeded 
by 5 or 10 nts. Going beyond the limits of 190 and 100 
'-ill lead to lover yields, but those may be acceptable 
in certain cases. 
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Restriction onzyr.es do not cue L L at sites 
closer than rtbout five base pairs fror. the end of blunt 
ds D;JA fragments {OLIP37). Therefore nts (with 
typically set to 5) of spacer arc added to ends that we 
5 intend to cut with a restriction enzyne . If the 
plasmid is to be cut vith a blunt-cutt i nq cnzync, then 
•-e do not uU.i any i:paccr to the correspond inq end of 
the ds DNA fragi-ent. 

10 To choose the optir.um site of overlap for the 

oligo-nt frag:ncnts, first consider the anti-sense 
strand of the DN*A to be synthesized, including any 
spacers at the ends, written (in upper case) f ron 5' to 
3' and le f t- to- right . ti . Vi. : The nt long overlap 

15 '-/indcw can never include bases that are to be 
variegated. ^L_S.i_L ^"1^*^ U.^ nt long overlap sliould not be 
palindromic lesZ single DUt\ nolecules prir.e themselves. 
Place a nt long windcw as cloce to the center of the 
anti-sense sequence as possible. Check to se-^ '-.-.frther 

20 one or more ccdons within the window can he changed to 
incr'^asc the CC content without: aj destroying a needed 
restriction site, b) changing amino acid sertUence, or 
c) making tlie overlap region palindromic. If possible, 
change seme AT base pairs to CC paim. Zz the CC 

2 5 content ot the window is Ices than "iO'i, slide the 
window right cr left as much as Q.^^ nts to riaxinize the 
nu-.ber of C's and C's inside the window, but without 
including any variegated bases. For each trial setting 
of the overlap v/indcw, maximize tt\e CC content by 

30 silent ccrion changes, but do not destroy wanted 
restriction situs or r.akc the overlap palindromic. If 
the best setting '•.till ha;5 less than bOt CC, enlarge 
the window zo N:^.*-2 nts and piai:c it within five nts of 
the center to obtain the maximum CC content. If 



enl.Trqing cho window one or c-'o ntis viLl Increase the 
CC content,. do so, but do not include vnricqated baces. 

Underscore the anti-sense strand fron the 5' end 
5 up to the right edge of the window. Write the 
compLenentary' sense sequence 3'-co-5' and 1 c J t-to-r iqht 
and in lower case letters, under the anti-srinse strand 
starting at the IcCt edge o: the window and continuinq 
all the way to the right end ot the anti-acnsc strand. 

0 

We will synthcr»i:e the underscored anti-sense 
strand and the part of the sense strand that we wrote. 
These two fragn^.ents, co-pler^.entary over the length of 
the window of high CC content, are nixed in equi.-noUr 

5 q\:antitioG and annealed. These fragments ar,c extended 
with Klenow fragment and all four deoxynucleot ids 
triphosphates to produce ds blunt-ended DUA. This DNA 
can be out with appropriate restriction enzynes to 
produce the cohesive ends needed to licate th-s fragnent 

0 to other C:.'A. 

SiPic. 5.2: r::A syn r hnr.is .and nu r L c^ t ion .-.r.thod.s : 

Thr: present invention if: not 1 in: tad to any 
5 'particular nethod of D:;a synthcsi-3 or construction. 
The- following procedures exemplify one way to achicv, 
the -joals of the present invention. 

DNA is synthesized on a MiUigcn 75G0 DIfA 
0 synthesizer (Hilligen, a division of r.iUipore 
Corporation, Bedford, MA) by standard procedures. 
Software to control the synthesizer and to keep records 
of each synthesis is r.uppliod by Milligcn. 
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The foilc'-ing magents arc supplied by Milligen: 



1) lH-tetra2oLc in acetonitriio^ 

2) 2'i (v/v) d i chloroacct ic acid in 
dichlororr.c thane , 

5 3) Acetic anhydride in 2,6 lutidine/acetcnitrile 

(1:1:8). 

4) 6. Si d ir.e -hy 1 an inopy r id ine' in acetonitrilo, 

5) 0. IM iod i nc in 2 , G ^ 

lut idine/wa ter/ tetrahydrofuran (8:8:84) , 
IT 6) 2\ (v/v) tr icthylamine in acetonitr ile , 

7) DMT-dAdcnoiiinG (Bz) cyanoethylphospho ran idite 

8 ) DMT-dCy t id ine ( Oz) cyanoothylphosphoraaid i te 

9 ) DMT-dCucinos i ne ( iBu ) cyar.cr- thylphosphoranid i te 
10) CMT-dThynidir.ccyanoethylphosphorariid i te 

15 11) Acetonitrile , anhydrous 

Tetrazo).o and acetonitrile are stored over 
molecular sicver. to sequester water. 

20 Phor.phoran id i tc? are dissolved in anhydrcu.-:; 

acetonitrile (Milligen) at 0.1 g/r.l. All ccher 
acetcnitrile used in the syntheses is "Low-vater 
Acetonitrile" supplied by J. T. Baker Checicai Ccr.pany 
Thiliipsbucg, UJ ) . Synthesis columns containing 

25 supports charfjed --ith an initial base for each of A, C, 
G, and T are obtained fror. Milligcn in tvo types, high- 
loading and lov-ioading. High-loading colur.ns are used 
for syntheses of oiigo-nts containing up to 6C bases 
and contain . betveon 35 and 70 nicronoies of anidite/g 

30 of support. The exact anount "^ries from lot to let. 
[,ow-loadipg columns containing between 4 and 7 
nicror.oics ar.idite/g support are used for syntneses of 
oligo-nts containing 60 bases or more. 
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The Mliligcn 7S00 has jevcn vials Iron which 
pnosphoramid i tos nay he taken. Morr.ally, the first 
four contain A. C, T, and C. The ether three vials 
(ray contain unusual bases such as ihcsine or mixtures 
5 of bases, the co-called "dirty bottle". The standard 
software alio-s proqranned nixing of two, three, or 
four bases in oquir:olar quantities. 

When a L;ynthe:;is is complete, the CNA is rer.oved 
LO Crop the support by incubating the supports in 1 ml of 
fresh 28-30^ a=:aonium hydroxide solution (E.M Science, a 
division of EM Industries, Inc., Cherry Hill, ::J) for 
15 hours at 50 degrees C, The solucion is dried under 
vacuum and the Dt.'A rcsuspendcd in 200 microliters of 
15 MPLC-gradc water ( BaKor-Analy zed Koagent f*^*, J.T. 
Baker Che.Tiical Co.) and is purit'icd by high-pressure 
liquid chror.rttcgraphy (iiPLC) or PACc. 

With low-loading supports, a S -ba se- Icng ol igc-nt 
20 is typically obtained at 1-2** of ::hcoreticai yield, 
^ ■ i . ^ . 10 ug; a 100-base- long ollco-nt i- typica-ly 

obtained in 0.5'-; of theoretical yield, LlA^ ^ • With 
h igh-ioa-u i fig s*-:pportc, 1 r.g of a : o -ca5:e- long oligc-nt 
§ is typicall> obtained. 

25 

The present invention is not lir.itcd tc any 
particular rot hod of purifying r:.'A for genetic 
engineering. KPLC is used for snth oligc-nts and 
fragments of several kb. Alternatively, agarose gel 
■ 30 electrophoresis and e 1 ect roe lu t ion on an' lUl device 
y (International Q i otochnclog ies , Inc.. N'ev Haven, CT) is 

^ used to purify large dsDHA fragments. For oligo-nts, 

a f-ACE and c lect roelu t ion with an Epigcne dev ice •' ( Epigene 

Corp., Dnltinore, r,0) are an alternative to HPLC. One 
35 altcrnativo for 0:iA purification is HPLC on a waters 
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(division of Millipore Corporation) HPLC system using 
the GonPak f**^' -FAX colur.n. A sar.ple of 100 picograms 
(pg) to 10 'jg can be leaded and recovered in 101-80% 
yield. T^le recovery varies vith the. size and 
5 ccncentrat icn of the d:.'A, and whether it is single or 
double stranded. A N*A.= 5 column fro.-a Pharmacia (Sweden) 
is used to cosalt d:;a elutcd frora the CenPak column. 
After passage over the :.'A?5 colunn, the ONA solution is 
vacuuQ desiccated. 

10 

Sec. G,l: Clc-.ing of K"cvn OSP-ipbd gene into OCV: 

• In this section, ve clone the osp- i pbd gene or the 
display prcte that wo have designed. In the preferred 

15 Dethod, the synthetic gene is constructed using 
plasraids thac are transforr.cd into bacterial cells by 
standard r.ethccs fMAras:, p250) or slightly modified 
standard -ethcds. Alternatively, ON'A fragnerts derived 
frori nature are cperably linked to other fragments of 

20 D.\"A derived frcr: nature or to synthetic DNA fragrnents. 
In r.ost cases j: the prorerrcd -ethod, gene synthesis 
^ involves cor.ii true t ion of a series of plasnids 
containing larger and larger segr.ents of the complete 
gene. Each plas.nid that contains a newly added portion 

25 of the esq - 1 rV:i gene or of the display probe is tested 
by restricwicn diqcsticn. PLas.T.ids having the expected 
restriction digestion pattern are sequenced in the 
region of tr.e latest alteration to confirm tho 
synthesis. 

30 

If, for ccnvcnicnco, sr.all plasr.ics were used for 
gene synthesis, the complete osp- i rbd gene or display 
probe is succior.cd into tho OCV at this point. 



■<;pe. fi.2 Cloninc o r' n-^ndjn Dr.-A fPccontial rs;-, Into 

nicDi.w Probe: 




IC rando:n CNA and phenotypic celecticn or 
5 screening are usoJ to obtain a G?(IF30). then ve clone 
random DNA into one of th^ restriction sites t.^it was 
designed into the display probe. 

The randon DttA may be obtained in a variety oC 
10 ways.' Degenerate synthetic DfJA is one possibility. 
Alternatively, psoLtdorandcr: DNA nay be taken frora 
nature. If, for cxanple, an S^h I site {CC.\rG/C) has 
been designed into the display probe at one end of the 
jjp^cl fragmtant, then '-c would use KU III (CMC/) to 
15 partially digest some d::a th.^t contains a wide variety 
of sequences, generating a wide variety fr^grients wich 
CATC 3' overh.'\nqs. Preferably, the display prcbe is 
designed with different restriction sites at each end 
of the jpbd gene so that random DN'A can be cloned at 
20 either end at the user's dir-cretion. The grincr^e of an 
, organism would be a suitable source of D!t'A with high 
sequence diversity. 

A plasnid carrying the display probe is digested 
25 with the appropriate restriction enzyne and the 
fragmented, rando.-n ON'A is annealed and ligated by 
standard methods. The li gated pi as- ids are used to 
transform ceils th.it arc grown and selected for 
expre-jsion of the antibiotic-resistance gene. Plasnid- 
30 bearing CPs arc then selected for the d ispl.v/-of -I PBD 
phenotypc by the procedure given in Sec. 15 of the 
present invention using Af>:(rPBD) as if it were the 
target. Sec. 15 is designed to isolate CP(?5D)s that 
bind to a target fron a l.irge population th.it do not 
35 bind. Use of the procedure of Sec. 15 to isolate a 
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genetic construction that leads to the display of a 
single type oC IPBD is different fron the designed use 
in one important way: any CP that displays' the IPDD 
will bind tightly and CPs that do not display IPSO will 
5 not bind, hence any reasonable amount of /\fM(rP3D) on 
the natrix will identify a successful clcne. 

As an alternative to selecting GP(IPBD)s through 
binding to an affinity column, we can isolate colonies 
10 or plaques and screen through use of one of the methods 
iistcd in Sec. 8 to identity clonal isolates that 
display IPBD on the CP outer surface. 



15 



Sec. 7: Harvest of CPs : 



After transforming cells with ligated cloning 
vectors, we fir^t grow the CPs in non-select ivo 
conditions to allow o>:prcssiori of the antibiotic- 
resistance markers on Che cloning vector. After a 
20 grow-out, we apply selective pressure to ^: i 1 1 
•jntrans formed cells. 

CPs arc harvested by methods appropriate to the CP 
at hand, generally, cent ri Cuga t ion to pellctize CPs and 
25 resuspension of the pelletc in sterile medium (cells) 
or buffer (spores or phage). 

sec. 8: V orif icir icn of Di:-play 5t_nij:.cqvi 

30 The harvested packages are now tested to determine 

whether the IPOD is present on the surt.^ce. Tn any 
tests of CPs for the presence of IPBD on the CP 
surface, any ions or cofactors known to be essential 
for the stability of IPEJO or AfM(IPDO) must be included 

3 5 at appropriate levels. The tests can be done: .t) by 
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afeinity labeling, b) e n z y m a t-i c a 1 1 y , c) 
spcctrophotonecricQlly, d) by affinity separation, or 
e) by affinity precipitation. The AfM(IPnD) in this 
step is or.e picked to have strong affinity 
(preferably, < 10'^^ M) for the IPSO molecule and 

little or no aLimity for the wtGP. For cxanple, if 
BPTI were the IPDD, trypsin, anhydrotrypsin, cr 
antibodies to BPTI could.be used as the AfM(BPTI) to 
test tor the presence of BPTI. Anhydrotrypsin, a 
trypsin derivative with serine 195 converted to 
dehydroalanlne, has no proteolytic activity but retains 
its affinity for 3PTI (AKOH72 and HUBE77). 

■■ Preferably , the presence of the IPBD on the 
surface of the CP is denonstrated through the use of a 
soluble, labeled derivative of a AfMCIPGD) with high 
affinity for IPBD. The label could be: a) a 
radioactive atcr. such as ^^^1 , b) . a chemical entity 
such as biotin, of 3) a fluorescent entity such as 
rhodanine or fluorescein. The labeled derivative of 
AfH(IPDD) is denoted as AfM(IPBD)*. The preferred 
procedure is: 

1) mix Af:;{IPDO)* with CPs that are to be tested 
for the presence of IPDD; conditions of mixing 
should favor binding of IPBO to AfM(IPnD)*, 

separate CPs fron unbound AfMCIPDD)* by use of: 

a) a r.olccular sizing filter that will pass 
AfM(IFBO)» but not CPs, 

b) centrifugation, or 

c) a r.olecular sizing column (such as 
Sepharose or Scphadcx) that retains free 
Afr-KIPBO)* but not CPs, 
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3) quantitate the Af:<(IPBD)* bound by CPs. 

Alternatively, if the IPBD has a known biochemical 
activity (enzynatic or inhibitory) its presence on the 
G? can be verified thrcuqT\ this activity. For exa-.ple, 
if the IPBD were 3PTI, then one could use the 
stoichiometric inactivaticn of trypsin not only to 
demonstrate the presence of BPTI , but also to 
quantitate the amount. 

If the I?3D has strong, characteristic absorption 
bands in the visible or UV that are distinct from 
absorption by the wtC?, then another alternative for 
neasuring the IP3D displayed on the c? is a 
spectrophotcr.etric r.easurenent . For exar.ple, if IPBD 
vere azurin, the visible absorption could be used to 
identify CPs that display azurin. 

Another alternative is to label the CPs and 
neasure the ar.ount of label retained by in.nobilized 
AfM(IPBO) . For exarple, the CPs could be grown with a 
radioactive precursor, such as ^^p^ or ^H-thynid ine , 
and the radioactivity retained by i.T..T.obil ized AfM(X?SD) 
measured . 

Another alternative is to use affinity 
chromatography; the ability of a CP bearing the IPSO to 
bind a matrix (cl Sec. 15.1) that supports a AfMCTPBO) 
is measured by reference to the wtCP. 

Another alternative for detecting the presence of 
IPBD on the CP surface is affinity precipitation. 
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re random DIM has been used, then the procedures 
of sec. 15 are used to obtain a clonal isolate that has 
the display-of-IPDO phcnotype. Alternatively, clonal 
isolates ' nay be screened for the d isplay-of-I?3D 
phenotype. The tests of this step are ..ppUed to one 
or n-.ore. of these clonal isolates. 

If no isolates that bind to the affinity r.olecule 
are obtained we take corrective action as disclosed in 
Sec, 9. 

If one or nore or the tests above indicates t-at 
the IPBO is displayed on the GP surface, ve verify that 
the binding of nolecules having known affinity for I?50 
is due to' the chineric psp-irbd gene through tne use of 
standard genetic and biochemical techniq-ues, such as: 

1) transferring the oso-io b-j gone into the parent 
C? to verify that osp-ip_bd confers binding. 

2> deleting the occ-ipt-d gene fron the isolate- C? 
to verify. that loss of or>o-iDbd causes loss o: 
binding. 

3) shoving that binding of CPs to .^fM(IPBD} 
correlates with {XINDUCEI (in those cases that 
expression of o sn-inbd is controlled by 
[XI.MDUCE]), and 

4) showing that binding of CPs to AfM(IPCD; is 
specific to the innobilized AfI<(irBD) and not to 
the support matrix. 
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Variation of: a) binding of CPs by soluble A£M(IP3D)*, 
b) absorption caused by IPDD, and c) bioche::iic.-\l 
reactions of IPBD are linear in the anount of IPDD 
displayed. Presence of- IPDD on the CP surface is 
indicated by a strong correlation between [XIMOUCE) and 
the reactions that are linear in the anount of IPBD. 
Lcakiness of the pror.oter is not likely to present 
problems of high background with assays that are linear 
in the amount of IPBD.. These experiments r.ay be 
quicker and easier than the genetic tests. 
Interpreting- ^he' effect of [XINDUCEl on binding to a 
(AfM(IPBD) ) colunn, however, may be problematic unless 
the regulated promoter is completely ■ repressed in the 
absence of [XINDUCE). The affinity retention of 
GP(IPDD]s is not linear in the number of IFBDs/CP and 
there may be, for exdmple, little phonotypic aifference 
between CPs bearing 5 IPSDg and CPs bearing 50 IFSDs. 
The der.onstration that binding is to AfM{irnD:i ir.d the 
genetic tests are essential; tha tests with X'INDUCE are 
optional. 

We sequence the relevant i obd gene fragment from 
each of several clonal isolates to determine the 
construction . 

Wo establish the- maximum salt concentration and pH 
range for which the CP(IPnD) binds the chosen 
AfH(IPBD). This is preferably done by meiisuring, as a 
function of salt concentration a.nd pH, the retention of 
AfMCIPBO)* on molecular sizing filters that pass 
AfM(IPBD) • but not CP. 



If the IPDD is displayed on the outside of the CP, 
and if that display iu clearly caused by the introduced 
35 of;p-ipbd gene, wo proceed to Pnrt II, otherwise' we must 
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analyze the result and adopt appropriate corrective 



Sec. 9: P nr^orrina thf? Disol^^v Systnn: 

If we have actcmptcd to fuse an i nbd fraq.-nent to a 
n^itural or.p craq-e.-.c, nur optionc; are : 

1) pick a different fusion to the same by 

a) using opposite end of o-p . 

b) keeping nore or fewer residues from os2 in 
the fusion; for example, in increments of 3 
or 4 residues, 

c) crying a known or predicted domain 
bounda ry , 

d) trying a predicted loop or turn position, 

2) pick a different osp . or 

3) switch to random DN*A method. 



If we have just tried the randc 
unsuccessfully, our options are : 



D:U r.ethod 



1) choose a different relationship between ipbd 
fragment and random D!IA ( i nhd first, random DNA 
second or v ice v^ rs.T ) , 
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2) try d different degree of partial digestion, a 
different enzyme for partial digestion, a 
different degree of shearing or a different source 
of naturcjl Dr.'A, or 

3) switch to the natural 02P method. 
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If oil reasonable OSPs of the current CP have been 
tried and the random DMA method has been cried, both 
without success, we piclc a new CP. 

5 5^nr.narv cf Part I: 

in Part I, wc have ccnistructcd a CP(IPBD). 
Although the target r.aterial is not picked until Part 
III, we have already discussed the general properties 

10 of targets that influence the choice of IPBD. The user 
may use the Ciryt CP(IP3D) as the starting point for 
design and construction of other CPs: CP(IP3D1), 
CP(:P3D2), The different IPDDs. night differ in 

charge and size in such a way that, for any target, at 

15 least one of the CP(IP3Dls will be appropriate as a 
starting point to develop a protein that will bind to 
that target. 

Part ir 

20 

.q^c. ICQ: Affini rv .SosAratlon Moans: • 

In Part .11 ve optimize an affinity separation 
system that will be used in Part III to enrich a 
25 population of G?(vgPBO)s for those CP(PBO)s that 
display PBDs with increased affinity for the target. 

Affinity chromatography is the preferred means, 
but FACS, electrophoresis, or other means nay also be 
30 used. 



Soc. 10. l: nnr;-.,-T;.rinn of Affinitv Ch ror.a t OQ ra p h:£ 

Separation: 
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For linear gradients, ■ elution volume 4pd eluanc 
concentration are directly related. Changes in eLuant 
concentration cause CPs to elutc from the colur.n. 
Elution volume, however, is more easily measured and 
5 specified. It is to be understood that the eluant 
concentration is the agent causing CP releas- and that 
an eluant concentration can be calculancd frc=i an 
elution volune and the specified gradient. 

10 Using a specified elution reginie, we compare the 

elution volur.es of GPCIPBD)s with the elution volumes 
of wtCP on affinity columns supporting A£M(IPBD). 
Conparisons are made at various: a) amounts of IP3D/CP, 
b) densities of Af M ( IPBD) / (volume of matrix) (DcAMcM) , 

15 c) initial ionic strengths, d) elution rates, e) 
amounts of GP/(volune of support), f) pHz , and g) 
temperatures, because these are the parameters nost 
likely to affect the sensitivity and efficiency of the 
separation. We then pick those conditions giving the 

20 best separation. 

We do not optimize pH or temperature; rather ve 
'record optimal values for the other para.neters for one 
or more values cf pH and temperature. The pH used must 

25 be within the range of pH for which GP(IPBD) binds the 
AfM(IPBD) that is being used in this step. The 
conditions of intended use, specified by the user (Sec. 
11), may include a specification of pH or temperature. 
If pH is specified, then pH will not be varied in 

30 eluting the colu.-nn (Sec. 15. 3). Decreasing pK r-ay. 
however, be used to liberate bound CPs from the -atrix. 
Similarly, if the intended use specifics a texpcrature, 
wo will hold the affinity column at the specified 
temperature during elution, but we might vary the 

35 temperature during recovery. If the intended use 
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spocifies the pH or tcr.pcrature , then ve pr-fcr that 
Che affinity scparacion be optinized for all ether 
paranetcrs at the specified pH and ter.peraturc- 

5 In the optinization devised in this stiep, we 

preferably vjse a molecule known to have coderate 
atfinity for the IP3D ( K^^ in the range 10"^ M to lO'^ 
M) , for the following reason. when popuiocicns of 
CP(vqPBD)s are f race iona ted , there will be roughly 

10 three subpopula t ions : a) those with no binding, b) 
those that have some binding but can be washed off with 
high salt or low pH, and c) those that bind very 
tightly and nust be rescued in situ. We- optiaize the 
paranecers to separate (a) from (b) rather than (b) 

15 frori (c) . Let P3Dy be a PSD having weak binding to the 
target and PBOg be a POD having strong binding. Higher 
UoAMoM Taight, for exanple, favor retention of CP(PBDy) 
but also noke it very difficult to elutc viable 
GP(?BOs). We will op'iimizc the affinity separation to 

20 retain CP(Pfla;) rather tnan to allow release of 
GP(PEDs) because a tightly bound C?(?BD^) can be 
rescued by 1q situ growth. If we find that DoAMoH 
strongly affcc'.s the elution volume, then in part III 
we laay reduce the anaunt of target cn the affinity 

25 coiunn when an SBO has been found with rsoderately 
' strong affinity (K^ on the order of IC''^ H) for the 
target. 

In case the promoter of the oso-i:3'od gene is not 
30 regulated ty a chemical inducer, we optimize DoAMoM, 
the elution rate, and the amount cf CP/volune of 
matrix. If the optimized affinity separation is 
acceptable, we proceed. If not, we must develop a 
means to alter the amount of I?BD per CP. Anong CPs 
35 considered in the present invention, this case could 
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arice only for spores because regu l*-it:cib la pro.-noiicrs are 
available for all otner 3yi;tens. 

It the amount of IPBD/sporo is too hiqh, ve could 
onginccr an operator site into the '. oso-ipbd qcne. We 
choose the operator sequence such that a repressor 
sensitive to a sr.all diffusible inducer rcco-inizec the 
operator. Alternatively, we could alter the Shine- 
Dalgarno sequence to produce a lo-er horioloqy with 
consensus Sh ine-Dalga rno sequences. It the a.-r.ount of 
IPBD/spore is too low, we can introduce variability 
into the pror.otcr or. Sh ine-Da Ifja r no sequences and 
screen colonies for higher ar.ounts oc IPBD/spore, 

In this step, we neasure elution volumes of 
genetically pure CPs that elutc from the affinity 
matrix as sharp bands that can be detected by UV 
aosorption. Alternatively, "sanples frcn effluent 

frac::ions can be plated on suitable -.c-diun (colls or 
spores) or on sensitive cells (phage) and colonies or 
plaques counted. 

Several values of IP3D/GP, DoAJ^oM, elution rates, 
initial ionic strengths, and loadings should be 
- exacined. The follovinq is only one of -.any ways in 
which the affinity separation could be opti-ii:ed. We 
anticipate that optimal values of tP3D/GP and OoAMoM 
will be correlated and therefore should be optinized 
together. The effects of initial ionic strength, 
elution rate, and anount of GP/(oatrix volune) are 
unlikely to be strongly correlated, and so they can be 
opti.T.ized independently. 

For each set of parameters to be te^;t<=;d, the 
cclurrin is eluted in a specified mariner. For example. 
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we may use a regir.e called Eiution Regime 1: a KCl 
gradient runs fron lOmM to r.axinum allowed for the 
CP(IPBO) viability in 100 fractions of 0.05 Vy, 
followed by 20 fractions of 0.05 Vy at maximum allowed 
5 KCl; pH of the buffer is maintained at the specified 
value with a convenient buffer such as phosphate; Tris, 
or HOPS. Other elution regimes can be used; -ivhat is 
important is that the conditions of this optimization 
be similar to the conditions th.it are used in Part III 
10 for selection for binding to target (Sec. 15.3) and 
recovery of GPs from the chromatographic system fSec. 
15.4) . 

When the os p- ipbd gene is regulated by [XINDUCE], 
15 IPBD/C? can be controlled by varying [XINDuCE:]. 
Appropriate values of (XINDUCE] depend on the identity 
of (XINDUCE] and the promoter; if, for exair.ple, XINDUCE 
is isopropylthiogalactoside (IPTG) and the promoter is 
lacUVS . then (IPTG) = 0, 0.1 uH, 1.0 uM, 10.0 uM, lOC.O 
20 uM, and 1.0 rJI would be appropriate levels to test. 
The range of variation of iXINDUCE] is extended until 
an optimum is found or an acceptable level of 
expression is obtoincd. 

25 DoAMoM is varied from the maxiinum that the matrix 

material can bind to 14 or O.U of this level in 
appropriate steps. We anticipate that the efficiency 
of separation will be a smooth function of DoAJ-IoM so 
that it is appropri.ite to cover a wide range of values 

30 for . DoAMoM with a coarse grid and then explore the 
neighborhood of the approximate optimum with a finer 
grid . 

Several values of initial ionic strength are 
35 tested, such as l.O m.M, 5.0 m.M, 10.0 mM and 20.0 .t.^. 
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Low ionic scrcnrjc.'^. favors binding betveer. oppositely 
charged groups, but could also cause CP to precipitate. 

Tlie elucion rate is varied, by successive factors 
of 1/2, from . the max inum attainable rate .to 1/16 of 
this value. If the Louect elution rate tested gives 
the best separation, '^e test lower elution rates until 
we find an optip«.;m or adequate separ£.tion. 



The goal of the optimisation is to obtain a sharp 
transition between bound and unbound CPs, triggered by 
increasing salt or decreasing pH or a combination of 
both. .This cptinisation need be performed only: a) for 
each temperature to be used, b) for each pH to be used, 
and c) when a new C?(IPBDJ is created. 



Sec^ 10.2: 

^ration: 



^■casurina the sensitivity of af f initY 



Once the v.^.iues of IP3D/G?, Do^KoM, initial ionic 
strength, elution rate, and aniount of C?/(volu-e of 
affinity support) have been optimised, ve detemine the 
sensitivity of the affinity separation {C^^^^i) by the 
following procedure that measures the minimum quantity 
of C?(I?QO) that can bo detected in the presence of a 
large excess of wtCP. The user chooses a number of 
separation cycles, denoted f'chrom' that will be 
performed before an enrichment is abandoned; 




preferably, i^ct)i 



is in the range 6 to 10 and N^^w^j-q 



must be greater than 4. Enrichment can be terminated 
by isolation of a desired GP(SBD) before M^^j^^-qj^i passes. 

The ncdiurcmcnt of sensitivity is significantly 
expedited if CP(rPE30) and wtCP carry different 
selec-ablc marl-icrs because such markers allow easy 
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Ci-(1P3D) is not: detected alter N^-j^j-qj^ passes, Vj^^^^ is 
decreased and t^.e process is repeated. 

Once a value for Vj^^j^ is found that allows 
recovery of G?(IPBD)s, the factor by which V^i^ is 
varied is reduced and additional values are tested 
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is knov/n to within a factor of two. 



^sensi equals the highest value of '"^Hn which 
the user can recover CP(IPED) within "chrom Passes. 
The nur.ber of chromatographic cycles (K^yc) that were 
needed to isolate CP(IPBD) gives a rough estimate of 
^eff'* ^eff approxir.atc-iy the K^yc^^^ root of Vlim: 



Ccff = (approx.) exp( log^ ( Vj^^-,) /K^yc ) 



For exar.pl e. 



if 



'li 



4 .0 



and three 



separation cycles wore needed to isolate CP(IP5D), then 
^eff ~ (approx.) 736'. 

Sec. 10.:: Me^^u rjng the efficiency of sevflr-i t. i on : 

To detor.i:inG C^f^ more accurately, we doter::iine 
25 the ratio of G? ( I PBD) /wtCP loaded onto an ArMflFBD) 
column that yields appro'/cinateiy equal amounts of 
G?(IPBD) and wcGP after elution. We prepare *r.ixturcs 
of GP(I?BD) and wtGP in ratios CP ( I PBD) : wtCP :: 1:Q; we 
start Q at twenty tir.es the approximate C^^^ found in 
30 Sec. 10.2. A 1:Q nixture of GP(IPDD) and vtCP is 
applied to a AfMCIPBD) coluDn and cluted by the 
specified elution regir.e, such as Elutiun Regime 1. A 
sample of the last fraction that contains viable CPs is 
plated at a dilution that gives well separated colonies 
or plaques. The presence of IPBO or the csp- ipbd gene 
in each colony or plaque can be determined by a nur.ber 
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cf standard ir.cchods, including: a) use of different 
selectable ■ markers, b) nitrccellulosc filter lift of 
CPs and detection with AfM(IPDD) * (AUSU37) , or c) 
nitrocellulose filter lift of CPs and detection with 
radiolabeled DNA that is cor:plementa ry to the osp-icbd 
gene (AUSU37). Let f be t.ne fraction of CP(IP30) 
colonics found in tho last fraction concaining viable 
CPs. When a Q is found such that .20 < F < .80, then 



-ef f 



= Q 



If F < 0.2, then ve reduce Q by an appropriate factor 
( e.g. I / 1 Q 1 and repeat the procedure. If F > 0.8, then 
we increase Q by an appropriate .factor f e .q . 2) and 
repeat the procedure. 

Sgc. m.^r Other Sena rot ion y.eflns 



Other separation .r.eans are optinized in a nanne: 
tO parallel to the used for affinity chrona tography . 



FAGS is likely to be most appropriate for 
bacterial cells and spcrcs because the sensitivity of 
the machines requires approx i.T.ate ly 1000 molecules of 
fluorescent label bound to each CP to acco-plish a 
separation. An appropriate con.-nercial F.^CS machine is 
a FACStar from Beckton-Dick inson , Mountain View, CA. 
To optimize FACS separation of C?s, we use a derivative 
of AfmCIP3D)A that is labeled with a fluorescent 
molecule, denoted Afn (IPBO)*. The variables that must 
be optimized include: a) amount of IPBO/CP, b) 
concentration of Afm(IPDD)-, c) ionic strength, d) 
concentration of CPs, and e) parameters pertaining to 
operation of the FACS machine. Because Afm(IPBO)* and 
CPs interact in solution, the binding will be linear in 



1^ 



\ i 

■'-a 

r .J 

-I 



t* . ■ .>*:1 



y 

i: 



m 



V 

t. 




0 



both t'^^^r^(^PBO) * J '^^'^ [displayed IPQU), Preferably, 
these; tvo parameters are varied toqathcr. The other 
parameters can be optinizcd independently. The 
sensitivity and, efficiency of the separation are 

.5 determined . .in, - a manner parallel , to. , chose used for 
chromatography. 



Electrophoresis is nest appropriate to 
bacteriophaqe because of their snail size. Server 
(SERWa?) has reviewed use of agarcse-gel 
electrophoresis to separate phage based on charge. 
Electrophoresis is a preferred separation means if the 
target is so small that chemically attaching it to a 
colu.T.n or to a fluorescent label would essentially 
change the entire target. For example, chloroaceta te 
ions contain only .seven atoms and would be essentially 
altered by any linkage. CPs that bind chlcrcace tate 
would become more negatively charged than CPs that do 
not bind the ion and. so these classes of CPs could be 
separated. 
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The parameters to optimize for electrophoresis 
include: a) IPBD/GP, b) concentration of gel material, 
e.g. agarose, c) concentration of Afn ^IPED), d) ionic 
strength, e) size, shape, and cooling capacity of the 
electrophoresis apparatus, f) voltages and currents, 
and f) concentration of CPs. Preferably, IPBD/GP and 
(Afm{IPBD) ] are varied at the same time and other 
parameters are optimized independently. 

In Part II we have dotcrmi-.-:d optimal conditicns 
for separating CPs based on protf?ins displayed on the 
CP surface. U'e have also determined the capabilities 
of the affinity separation system. Knowledge of these 
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capubi 1 icifeG allowc; us zo chooisc appropriate levels 
variecjation in Pare III. 

Part III 

Sqc. U.O: Choice of tarnct natcrial : 

Any naCeriAl r.^y bo chcson as tarqec materi 
subject: only to tto follovinrj restrictions: 

If affinity chror.a toq raphy is to be used, then: 

L) the molecules of the target material nust be 
sufficient size and chcnical reactivity to be 
applied to a solid support suitable for affinit 
separation, 

2) after application to a matrix, the target 
r.aterial r.ust not react uith vater, 

3) a£ter application to a matrix, the tar^ot 
material must not bind or degrade proteins in a 
non-specific -ay, and 

4) the molecules of Che target material nust bo 
sufficiently large that attacning the material 
a matrix allows enough unaltered surface area 
(generally at least 500 , excluding the atom 
that is connected to the linker) for protein 

b i nd i ng . 



If FACS to be tj:;cd as the affinity separat 
means, then: 
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1) the ir.olcculos cf the t-arqot nat'jri.ii nust tc of 
r.uf f icienC sizo and chcnicrti reactivity to he 
conjuc,-atecl to a suitiJbie fluorescent dye or the 
target nust itr.eLf be fluorescent, 

2) after jny iiocessary fluorescent labeling, t.^e 
tiirqet TiUSt net react with water, 

3J after any necessary tluorescent luteling, the 
target r.aterial nust not bind or degrade proteinr, 
in a non-specific way, and 

4J tlie noleculos of the t.^.rget r.aterial nust be 
sufficiently large that attaching the m.^terial to 
a suitable dye allowc enough unaltered surfaro 
area (generally at least 500 excluding the 

aton that is connected to the linger) for p^^tcin 
binding. 

If Jffinity o lect rophorec i s is to be u.^ed, then: 

1) the target r^ust eithnr to charged or of r.uch a 
nature that its binding to a protein will cli.inge 
the charge of the protein, 

2) the target r.oterial must net renct with v/ater, 

3) the tnrnet n^iteri-Mi r.ust net bind or deqrade 
proteins in a non-specific way, and 

the target r.ust be compatible with a suitable 
gel n;jtor ial . 

Possible target natcrioU incluue, but are not 
United to: 
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yeast phenylolanyl cRN^ 

5) asbestos 

6) alpha-eccoproccin 

7) r3S proteins 

3) low density lipcprctc:n 

prostaqUndln FCE2 
10) alpha interferon 

U) melUtin ..oenylate cyclase toxin 

13) aflatoxin Bi 

14) aspartar,e 

15) haem 

16) bilirubin 

17) r.orphine 

IS) codeine . _ ; , c rethane (DDT) 

10) dichlcrodipnonvl-^-^-^^'^ 

;0) benzo(a)pyrcne 

21) actinomycin 0 

22 any retroviral ^ P-toaso 

retroviral cva protea.e 
23) any retrovi.^ . 

- ^^^^^v°:;re:r:::---- specks. 

,81 r coU cntcrocoxiu prote.n 

30, S.^^ '^^^'^^^^ 

31) zeolites 

32) cellulose 




33) hydroxyiapat itc 

34) DNA of a defined soqucnce 

35) fibrin 

36) tun-.or necrosii: factor 

37) specific monoclonal antibodies 



A r.tjpply of several nilliqrams at pure tarqct 
matcritsl is desired. Irpnre tarqet nator'ial could be 
u:,ed, bet one might obtain a protein that binds to a 
10 contaninant instead of to the target. 

The following information about the target 
material is highly desirable: 

15 1) stability as a function of ter.pe ra ture , pH, and 

ionic strength, 

2) stability with respect to chaotrcpes such ar. 
urea or guanidin'.un Cl , 

20 

3) pi. 



4) molecular weight, 

25 5) requireaentc for prosthetic groups or ions, 

such as haen or Ca"*"^, and 

6) proteolytic activity, if any, 

30 

In addition to this nout desirable infomation, it 
is useful to know: l) the target's :;eqi;ence, if the 
target is a macrono Iccu le , 2} the 3D structure of the 
target, 3) enzymatic activity, if any, and 4] toxicity, 
35 if any. 
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The u2er of the present invention cpecit'ios 
certain paranecers of the intended use of t^ie binding 
protein: 

1) the acceptable terr.pcrature rar.qc, 

2) the acceptable pH ranqe, 

3) the acceptable concentrations of ions and 
neutral solutes, 

4) the naximun acceptable dissociation constant 
for the target and the S30: 

K-p - (TargotUSSD]/[Target:SbOi 

In Go-o cases, the ut-er nay req-jire discrimination 
bcf-ecn T, the target, and U, scnie non-carcet. Let 

= i'Tj [SSD)/iT:CBO] , and 
X„ = tri) (S3D]/[ri:SBD] , 

then Kx/Ka = ([T)(N:SE0])/([:ilfT:S301). 

The user thcin specifies a naxinun acceptable value for 
the ratio K-^/K^i. 

The target r.atoriaL r.ust be stable under the 
specified conditions of pH, temperature, and solution 
condit ion:; . 
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If the target material is a protaase, one must 
35 consider the following points: 
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1) a hicjhly specific proccosc can ho t:rcatod like 
any otht;r tarcyct, 

2.) a qcnoral proteose, such as subtilisin. nay 
degracic the OGPs of the CP including onP-PBDs; 
there "are several alternative t-*ays of dealing with 
gcnfer-Hl proteases, i nc 1 u'J i r.g : a)^ crrj-ical 
inhibitor r.ay be used to prevent prcc'^olysis (o.jj^ 
pheny i ziethy 1 f luorosul f a to (PMFS) ZT\;xz inhibits 
serine proteases) , b) one or more active-site 
residues ::iay be mutated to create an inactive 
protein f e .q . a serine protease in which zhe 
active serine is -utated to alanine), or c) one or 
nore active-site ar.inc-acids of the protein nay be 
chemically ncdifiod to destroy the catalytic 
activity f e . g . a r.erine proteose in vhich the 
active serine is converted to anhy-J rcsor ine) , 
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3) SEDs selected for binding to a protc-ase need 
20 not be inhibitors; SDDs that happen to inhibit 

the protease target are a fairly snail subset of 
SBDs that bind to the protease target, 

4) the r.cre .vc nod if y the target protease, the 
2 5 ' less li^:e we are to obtain an S30 that inhibits 

the target protease, and 

5) if the user requires that the S5J0 inhibit the 
target protease, then the active site of the 

30 target protease nust not be nodifici any ^rorc than 

necessary; inactivaticn by nutation or chenical 
modification are preferred methods of Inactivation 
and a protein procea^c inhibitor tecor.es a prime 
candidate for IPno. Tor example, Ri-Tl could be 

35 nutacr-d, by th^i methods of the present invention. 



ro bind to proCcaisc-:; ochoc than trypsin (TANK?'? 
and TGCHS?) . 

See. 12.0: Choice of OPKpgO) : . 

The user must piLri a cr(IPBO) that is suitable to 
the chosen tarqct ocrcrimq to the criteria of Sec. 2. 
It is anticipated a email collection of a 

CP(IPCO)s can be uscc-blcd such that, for any chosen 
target, at least one -cr.tror of the collection will be a 
suitable starting point for engineering a protein that 
binds to the chosen target by the nethcis of the 
present invention. 

If the pH, temperature, or other p.^ratneters of the 
intended use of the selected SDD differ r.arJcedly froa 
the conditions used -to c-jci.-nize tl-.e affinity scp.^ration 
for the chosen CPdPGC). then the user should cpti::ite 
the affinity scparacicn for conditions appropriate to 
the intended use by the r.ethods described in Part II. 

Pec. 13. 0: Tdent i f ic>- ion of Firiilv of PSr.s. Related 
to PPDD. • to Be Conor-ite:! 

Sec. 13.1: Chons ln a ro5= i.i'.:os on IPBD for oti'.er 
to vary: 

We chociie residues in the IP3D to vary throug.^ 
consideration of several factors, including: a) the 3D 
struccure of the IPOO, b) sequences homologous to IPSO, 
and c) nodt-ling of the IPDO and mutants of the IPSO. 
Because the nur.bcr o: residues that could strongly 
influence binding is alvjys greater than the nuir,ber 
that can be v.\ricd s inu 1 tancous ly , the user must pick a 
subset of those residues to vary at one time. The user 
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„us. also pic. trUl l-=i= °^ v.rio.ation ana 
calculate .he at.undaP.cos of various sequences. T.o 
n c varica residues and c.e ievoi vac.e.oc.on a. 

varied residue are .d^us-.ed uncil .he conpos^.c 
variegation is comnensurato with C^^nsi "ntv 

we no-, consider the principles that guide our 
choice oe residues =f "-he U-OO .o ^^V- ^ -y^ccnccp. 
is that only structured proto.ns exhibit .pecUic 
naing -n bind to . p.-icuiar chemical entity 

t e e.:T;rsion =e .ost ot.ers. Thus the residues to 
•.w eve to preserving the 

be varied are chosen with an eye h 

underlying IPBO structure. Substitutions J^-J^ 
t.e PBO fr=3 eoldin, --lU cause CPs carrying . o 
genes to bind indiscriminately so that they can easi.. 
be removed troni the population. 

Burial of hydrophobic surfaces so that bulk -ater 
,s excluded is one of the strongest forces driving the 
i..ding of proteins to pther .olecules. Oul. -.ter con 
.e excluded fro. the region between two molecules o.. , 
the surfaces are complementary. - - 

surfaces as possible to find one that is 
complementary to . the tar.ot. The seiection-throu...- 
b'inaing isolates tho.e protcins that are .or nea 
ccnple.entary to sone surface on the -^^^o. s 
effective diversity of a variegated P°P^'-- ; 
neasured by the nunbor of different surfaces, r ...er 
tnan the nucber of prote.n sequences. Thus - 

xi.ize the number of surfaces generated in oa. 
population. rather th.in the nu.ber of protein, 
sequences. 

in hyoothotical eK..r.ple I. "e consider a 
35 hypothetical PSD. shovn in Kigure . binding to a 
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hypothetical t2r*;eC. Figure 4 is a 2D scher.atic of 3D 
objects; by hypothesis, residues 1, 2, 4, 6, 7, 13, i;, 
15, 20, 21, 22, 27, 29, 31, 33, 34 , 36, 37, 3S, and 39 
of the IPBD arc on the 3D surface of the IPSO, even 
5 though sho*-n well inside tt?e circle. Prottiins do not 
have distinct, countable faces. Therefore we define an 
"interaction set" to be a set oC rtisidues such that all 
raer.bers of the set can s irr.u Ltaneous ly touch one 
molecule of the target naterial uithout -any atora or* the 

10 target coning closer than van der Waals distance to any 
main-chain atom of the IP3D. The concept of a residue 
"touching'* a r.oleculc cf the target is discussed belo--. 
One hypothetical interaction set. Set A, in Figure s 
comprises residues 6, 7, 20, 21, 22. 33 , and 34. 

15 represented by squares. Another hypothetical 

interaction sat, Set B, co.^iprises residues 1, 2, 4, 6, 
31, 37, and 39, represented by circles. 

If we vary one residue, nur.ter 21 for exar.ple, 
2C through all tvf-nty ar.ino acids, '^^e obtain 20 prst-.-in 
sequences and 1-0 different surfaces for interaction set 
A. Note that residue 6 is in tvo interaction sets and 
variation cf residue 6 through all 20 p.mino acids 
yields 20 versions of interaction set A and 20 versions 
25 of interaction set 5. 

Now consider varyirg tvo residues, each through 
all twenty ar.ino acids, generating 400 prctcin 
sequences. If the tvo residues varied vere, for 

3 0 exar-ple, nu-ber 1 and nu-ber 2 1, then there would be 
only 40 different surfaces because interaction set A 
dees not depend on residue 1 and interaction sot 3 docs 
not depend on residue 21. If the two resid'-es varied, 
however, were nu-bci- 7 and nu-.ber 21, t^-^n 4 00 surfaces 

3 5 would be generated. 
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If N spacidlly separated residues arc varieci at 
one tine, 20 x N surfaces are qoneiated. Variation oC 
K residues in the same interaction set yields 20*' 
surfrices. Fcr exatrplc, if N = 7, variation of 
separated residues yield:: 1-iO surfaces while variation 
of interactinq rcniduco yields ?.o'^ ^ 6.; x 10^ 
surfaces. Tr^us, to i^e tne nunber surfaces 

generated when S residues are varied, all residues 
should be in the sare inccraction set because variation 
of several residues in ere interaction set generates an 
exponential nu.-ber of surfaces while variation of 
spatiallv separated surface residues generates only a 
linear nu-ber. 

The a.T.ount of surface area buried in strong 
protein-protein interactions ranges from 1000 \^ to 
2000 A^, AS GuruT.arizcd by Schulz and Schirrer (SCHU79, 
pl03ff). Individual ar.ino acids have total sui race 
areas that depend r.c^tly cn type of amino acid and 
-eaKiy on con f orr.a t ion . These areas range fron about 
ISO A 2 for glycine to about ,160 for tryptophan. 

Averages, or total surface area by anino acid type and 
naximum exposed surface area of each anino acid type 
^fur two typical proteins, hc-n egg -hite lysozyr.c (HZWL) 
and T4 lysozyne (T4L1, are shown in Table 6, frcr. 
these exposures, one car. calculate that ICOO A- cn a 
protein surface ccr-p r iscs bof-een 4 and 30 arr.ino acids, 
depending on the amino acid types and the protein 3D 
structure. Varied a.-ino ocid sequences, as found in 
actual proteins, involve r:ofJcen 10 and 25 residues in 
forcing 1000 A- of prctein surface. Schuls and 
Schir.Tior osCir.ate that ICO A- of protein surface can 
exhibit as nany as l.'OO different specific patterns 
(SCnU70, plOS) . Tlie nu-ber of surface patterns rises 
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exponentially with the area thot can he varied 
independently. One of the BPTI structures recorded in 
the Drookhaven Protein Data Bank (6?Tt), for example, 
has a total exposed surface area of 3997 (using the 
5 method of Lee and Richards fLEEB?!) and a solvent 
radius of "1.4 A and atonic radii as shoun in Table 7). 
If we could vary this surface freely and if 100 can 
produce 1000 patterns, wc could' construct 10^^^ 
different patterns by varying the surface of BPTIl 
10 This calculation is intended only to suggest the huge 
nunber of possible surface patterns based on a coiru-non 
protein backbone. 

One protein franevcrk cannot, however, display all 
15 possible patterns over any one particular 100 A^ of 
surface merely by rcplaconent of the side groups of 
surface residues. The protein backbone holds the 
varied side groups in approx ina te ly constant locations 
5o that the variations are net independent. We can, 
20 nevertheless, generate a vast collection o: different 
procoir. surfaces by varying those protein residues that 
face the outside of the protein. 

Figure 5 shows 3?TI in contact with myoglobin. 

25 'from this ve can see that residues 3, 7, 8, 10, 13, 39, 
41, and 42 can all s ir.ul tancously contact a nolecule 
Che size and shnpe of myoglobin. Figure 5 also shows 
that residue 49 can not touch a single ayoglobin 
r.olecule s ir.ultancously with any of the first set even 

30 though all are on the surface of BPTI. It is not the 
intent of the present invention, however, to use r.odels 
to dctornine which part of the "target molecule will 
actually be the site of binding by PBO. 
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If cassette niutaqencs is is picked, the prctcin 
residues to be varied are, preferably, close enough 
together in sequence that the variegated DNA (vgDNA) 
encoding all of them, can be made in one piece. The 
present invention is not limited to a particular length 
of vgDNA that can bo synthesized. With current 
technology, a stretch of 60 anino acids (180 DriA bases) 
can be spanned. 

Further, when there is reason to .nutate residues 
further than sixty residues apart, one can use other 
mutational r. eans, such as single-stranded- 
oligonucleotide-direited nutagenesis (DOTS35) using tvo 
or Dore mutating primers. 

Alternatively, to- vary residues separated by r.cre 
than sixty residues, t-'o cassettes may be mutated as 
follows: 

1) vg o:.'A having a i ow level of variegation (for 
example, 20 to ^DQ fold variegation) is introduced 
into one cassette in the QCJ , 

2) cells are trAnsfomcd and cultured, 

3) vg OCV D:;a is cbtain.-d, 

4) a second segr.ent of vgD.'.'A is inserted into a 
second cassette in the OCV, and 

.5) cells are transformed and cultured, CPs are 
harvested and subjected to selection-through- 
binding. 
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The composite level of variation nusc not exceed the 
prevailing capabilities to a) produce very Urge 
nuabers of independently transformed cells or b) detect 
snail components in a highly varied population. The 
5 limits on the level of variegation are discussed .n 
Sec. 13.2. 

Hero ve a::^c-ble tr.e data about the IPBD and the 
target that arc useful :n deciding which residues to 
10 vary in the variegation cycle: 

1) 3D structure, or at least a list of residues on 
the surface of the Ir'BO, 

^5 2) list of sequences hor.ologous to IPOD, and 

3) nodel cf the target r.olecule or a stand-in for 
the target. 



These data and an understanding cf the behavior of 
different anino acid= in proteins will be used to 
answer two questions: 

1) which residues cf the TP50 are on the outside 
and close enough together in space to touch the 
target sinul tancously? 

2) uhicn residues of the IPDD can be varied with 
30 high probability of retaining the underlying IPBD 

structure? 
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Although an atonic ccdcl of the target r.aterial 
(cbtainon throu<;n /.-cay crystal Icgriphy, mn. or other 
35 Deans) is preferred in such e.cninotion, it is not s..^ 
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necessary. for ex-a.-r.ple, it* chn target were .3 protein 
of unknown 3D structure, it vouid be sufficient to know 
the molecular wcicht. of the protein and -whether it were 
a soluble globular protein, a fibrous protein, or a 
5 mer>brane protein. Physical neasureniontG , such as lov- 
angle neutron diffraction, can determine, the overall 
molecular chnpe, viz. tr.c ratios of the principal 
nicr\cntG of inertia. One can then choose a protein o: 
known structure of the sar.e class and similar size and 

10 shape to use as a r.olecular stand-in and yardstick. It 
is not essential to measure the moments of inertia of 
the target bcca*- at lov resolution, all proteins of 
a given size ar.i class look much the same. The 
specific volumes are t^■.e same, all are more or Less 

15 spherical and therefore lH proteins of the same siz<i 
and class have about th*^ same radius of cur^/ature. The 
radii of curvature of the tvo molecules determine how 
much of the ' two r.olecules can come into contact. 

20 Several graphical and computational tools that are 

needed or useful. The most appropriate method of 
picking the residues of the protein chain at which the 
amino acids should be varied is by viewing, with 
interactive computer graphics, a model of the I?50. A 

25 stick-figure representation of molecules is preferred, 
X suitable set of har.::vare is an Evans d Sutherland 
PS390 graph -.cs cermi.-:al (Evans Sutherland 
Corporation, Salt Lake City, UT) and a MicroVAX 11 
supermicro corputer (Digi"al Equipment Corp., y.aynard, 

30 M/0 . The computer shculd, preferably, have at least 
150 megabytes of disk storage, so that tt'.-i Drookhaven 
Protein Data Bank can ce kept on line. A tC?.7?J^U 
compiler, or sc.v.e equally good higher- level language 
processor is preferred for prograr devexopment. 

35 Suitable programs for vicving and mani'iu lat ing protein 
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models include: a) PG-FROCO. written by T. A. Jones 
(JONESS) and distribuCcJ by the Siochemistry Depirtmenc 
of Rice University, Hcusccn, TX; and b) PROTEUS, 
developed by Dayringer, Tramantano, and Fletterick 
(DAYR36). Important features of PS-FRODO and PROTEUS 
that are needed to view and manipulate protein- nodels 
for the purposes of the present invention are the 
abilities to: 1) display r.olecuiar stick figures of 
proteins and other r.oLoculcs. 2) zoom and clip ir.ages 
in real tine, 3) prepare various abstract 
representations of the -olccules, such as a line 
joining C^^i^^^s and si-o group atons, 4) compute and 
display solvent-accessible surfaces reasonably quiclcly, 
5) point to and identify ato.T.j, and 6) neasure distance 
between 
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In addition, one could use theoretical 
calculations, such as dynamic simulations of proteins, 
to estinate whether a substitution at a particular 
20 residue of a particular a.-.ino-acid type might produce a 
protein of approximately the sane 3D structure as the 
parent protein. Such calculations night also indicate 
whether a- particular .substitution will greatly affect 
^the flexibility of the protein; calculations of this 
25 sort nay be useful but are not required. 
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■Sec. 13.1.1: The principal sot: 

In this section we pick a principal set of 
residues of the Il'ElD to vary. Using the knowledge of. 
which residues are on the surface of the TPBD (as noted 
above), we pick residues that arc close enough together 
on the surface of the IPDD to touch a molecule of tl;e 
target simultaneously without having any IPSO r.ain- 
chain atom cor.e closer th.in van der Waals distance 
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(vj^ 4.0 to 5.0 A) fron any tiirgct atom. For tMo 
purposes of CMc present invention, a residue of the 
IPDD "touches'* the target if: a) a main-chain atom is 
within van dor waals distance, vjz . 4.0 to . 0 A of any 
aton of the target nolccule, or b) the Ct^eta within 
Dcutoff °' ^^'^ °^ target molecule so that a 

sido-group atom could nak.e contact with that- ato:n. 
Decause side gcojps differ in slzn (c_L^ Table 35), some 
judginont i'j required in picking Dcutoff- 
preferred embodiment, ve will use D^utoff ^ ^'^ ^' 
other values in the range 6.0 A to 10.0 A could be 
used. If IPBD has C at a residue, we construct a 
pscudc Ci3<,ta '^^^^ correct bond distance and angles 

and judge the ability of the residue to touch the 
target from this pscudo C^jc^q. 

Alternatively, we choose a set of residues cn the 
surface of the IP3D such that the curvature of the 
surface ricfinoJ by- the residues in the set is not y.o 
great that it would prevent contact between all 
residues in the set and a molecule of the target. This 
method is appropriate if the target is a macronolccule, 
liuch as. a prctc-in, because the PBCs derived from the 
IPDD will ccntcict only a pari of the macromc Iccu lar 
surface. The surfaces cf macrono lecu les are irregular 
with varying cur^.M tu res . If we piclc residues that 
define a s'jrface that Is not tco convex, then there 
will bo a region on ■ a macromo L ecule r target with a 
cor.p:itiblc cur/ature. 

In addition to the geometrical criteria, wg prefer 
that there bo some indlcitlon that the underlying IPBD 
structure will tolerate substitutions at each residue 
in the princlp-il set of residues. Indications could 
come from various sources, including: a) hor.olcgous 
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*;eqi:ep.cc3 , b) static computer trodoling, or c) rtyr.amic [" 

The residues in the principal set need not bo y ' 

contiguous in the prctein sequence. The exposed r>..' 
surfaces of the residues to be varied do not need to be 
connected. We require only that the amino^ acids in the 
residues to bo varied all be capable of touching a \y. 
nolecule of the target material simultaneously ui^nout ^' 
havi.ig atons overlap. If the target were, for exanple. 
horse heart myoglobin, dnd if the IPBO were BfTI, any 
set of residues in one interaction set of BPTI defined 
in Table 3 4 could be pic>:6d. f[ 



I. 



15 Preferably, the principal set contains eight to 

sixteen residues. This number of residues allows 

sufficient variability that a- surface that is i; 

co.T.plenicntary to the target can be found, but is snail j;:.-. 

enough that a sioniticint fraction of the surface can pj.. 

20 be varied at one tir.e. ^" 

Iv 

Sec. 13.1.2 : Th e cocc ."/:h -/ set: 

r - 
*••'* 

The secondary sot ccr.prises those residues not m f: 

25 'the pri.Tiary set that tc-^ch residues in the primary set. ^■ 

These resiriues eight hn excluded from the primary set 

because: a) the residue is internal, b) the residue is 

highly conserved,, or c) the residue is on the surface, |. 

but the curvature of the IPBD surface prevents the z 

30 residue fro:n being in contact with the target at the jr 

same time as one or ncrc residues in the primary set. . \- 

r 
r 

Internal reciduos ar«i: frequently conserved and the |^ 
amino acid type crin not be changed to a significantly 
35 different type without cubstantial risk that the 
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protein strucuure will be disrupted. f:eve rthc l-^ss , 
some conservative changes of internal residues, t;ucM as 
I to L or r to Y, are tolerated. Such conser'/at ive 
changes affect the detail pljcement and dyna^iics of 
adjacent protein residues and such variation nay be 
useful once an S3D is found. 

Surface residues in the secondary set arc .r.ost 
often located on the periphery of the princip-jl sec. 
Such peripheral residues can not r.akc direct contact 
with the target s inu 1 taneously with all the ether 
residues of the principal set. The charge on the anino 
acid in one of these residues could, ho'-'ever, have a 
strong effect on binding. Cnce an SBD is found, it is 
appropriate to vary the charge of sone or all of these 
residues. For example, the variegated codon ccncaining 
equir.olar A and G at base 1, eq^jimolar C and A at base 
2, and A at base 3 yields anino acids T, A, K, and E 
with equal probability. 



Sec ■ 1 3 . 



. 3 : 



Choice of residues to va rv initial i-. 




Choice of residues m the prirr.ary and secondary 
set is based on: a) gcc:7X-try of the IPBD and the 

25 -geonetrical relationship between the 1?3D urd the 
target (cr a stand-in for the target) in a hypothetical 
cor.plox, and b) sequencer of proteins hor.olcgous Co t.he 
IPBD. In this cection we pick a subset of the residues 
in the primary and secondary sets, based on geor.otry 

30 and on the naxir.un allowed level of variegation that 
assures p rog ress i v i ty . The allowed levoi of 

variegation dcternines how nany residues can be varied 
at once; gconctry dctcrnines which ones. 
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Tho user nay pick residues to vary in m.iny v.iyr, ; 
the following is a preferred manner. Pairs of residues 
are picked that are d iar^e tr ici H y opposed across t!ie 
face of the principal set. Two such pairs are used to 
5 delimit the surface, up/down and right/left. 
Alternatively, three residues chat form an inscribed 
triangle, having as large an area as possible, on the 
surface arc picked. One to three other residues are 
picked in a checkerboard fashion across the interaccion- 
10 surface. Choice of widely spaced residues to vary 
creates the possibility for high specificity because 
all the intervening residues must have acceptable 
conplenentarity before favorable interactions can occur 
at videiy-separaced residues. 

15 

The number of residues picked is coupled to the 
range through which each can be varied by the 
restrictions discussed in Sec. 13.2. In tne first 
round, we do net assuce any binding between IPBD .and 

ZO the target and so progress iv i ty is not an issue. At 
the first round, the user may elect to produce a level 
of variegaticn such that each r.cleculo of vgONA is 
potentially different through, for example, unlimited 
variegation of 10 codons (20^° approx. ^ 10^^). One 

25 zun of the D(.'A synthesizer produces approximately 10^^ 
molecules of length 100 nts. Inefficiencies in 

ligation and trans fcrmcttion will reduce the nunber of 
proteins actually tested to between lo"^ and 5 x 10°. 
Multiple replications of the process with such very 

30 high levels of variegation will not yield rcpeatable 
results; the u-jer rust decide whether this is 
important . 



P.Ann,- of variation at Z^r^ Site , 
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Havir.g picked which residues to var^', ve siust now . 
decide the range oc amino acids to i\llov at each 
variable residue. The total level of variegation is 
5 the product of the number of variants at each varied 
residue. Each v-iriod residue can have a different 
schene of variegation, producing 2 to 20 different 
possibilities. We require that the process be 
progressive, i.e. each variegcition cycle produces a 
10 better starting point for the next variegation cycle 
than the previous cycle produced. 

}i .3, : Setting the level of variegation such 
that the pobd and many sequences related to 

15 the pr?bd seqi:er.ce are present in detectable 

anounts insures that the process is 
progressive. If the level of variegation is 
so high that the cobd se^jence is present at 
such low levels that there is an appreciable 

20 chance that no transf ornant vill display the 

FP'dO, then the best SBC of the re>:' round 
CPU Id be worse t.han the P?SD. At 'excessively 
high level of variegation, each round of 
mutagenesis is independent of previous rounds 

25 and chero is r.o assurance of prcgressivi::y . 

This approach can lead to valuable binding 
proteins, but repetition of experir.ents vich 
this level of variegation will not yield 
progressive results. Ex'cesGive variation is 
'30 not preferred. 

Hypothetical exar.ple 2 considers the effects of 
the level of variegation on the prog ross iv ity of the 
process of the present invention. Figure 6 is a 
35 schematic vicv of a hypothetical eight-residue binding 
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surface of a POD cccprising residues 11. 24. 25. 30. 
3^^. -12. 44, and 47 of a hypothetical protein. Each 
polygon represents the exposed portion of one residue. 
By hypothesis, there exists at least one protein, shovn 
5 in Figure 6e. having a specific amino acid in each of 
the eight residues that will bind to the target, but ve 
do not, at first, knou what that sequence is. 

The IPBD, shovn in Figure 6a. may have none of the 
10 optical ar.ino acids on its surface. Because we begin 
with no information, our initial estimate is that all 
amino acids have equal likelihood of being the best at 
each of the eight residues. 



of 



Dy hypothesis, the genetic engineering system 

^7 and the 

selection-through-binding system ' ^ . .pi7 



hypothetical exanplo 2 has M^tv 



= . 10' 



has C^ensi = 

Also by hypothesis, the variegation method can produce 
all anir.o acids at a giv 
probabil ity . 



residue with equal 



rn the first variegation, ve vary residues 11. 24, 
25, 34 , and 44 through aU twenty a^ino acids, 
producing 20^ = 3.2 x 10^ sequences. The capabilities 
of the gp.netic engineering system allovs all these 
sequences to be present in the selection step and the 
selection system can detect 1 G? in lo'' . By 
Hypothesis, we isolate a GP carr^'ing an sbd gene that 
encodes the first SGD, shown in Figure 6b, that has 
improved binding for the target and has the amino acid 
sequence Wl i -F 24 -E:5 -G3 0-D3 4 -E4 2-P4 4 -T4 7 . This acina 
acid sequence becor.es the parental sequence to the next 
variegation. Atter the first variegation and 
selection, the evidence favors WU, F24, E25. D34, and 
P44 as optimal a-ino acids at their respective 
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residues. That residues 30, <2. and ^7 w^sre not varlcl 
has two ir.pl icat ions : 

1) VG still have no information about which amino 
5 acid is optimal at these residues, and 

2) the a.-nino acids selected at the varied residues 
are optinal, qivcn the identities of the anino 
acids in the non-varied residues; -hen residues 

10 30, s2, and 47 are varied, our estinate of the 

optir.al anino acids in other residues nay change. 

Nov consider tvo versions of a variegation that 
take the first intermediate SOD as parent and that 
15 night get us closer to the opti-al SBD. 

In the first version of the second variegation, ue 
vary only five residues, producing 3.2 x iO° se:quences, 
all of which are expressed and subjected to sclecticn- 
through-binding . tve vary residues 30, 42, and 47 
because they were not varied previously. We also vary 
two other residues so that as many surfaces as possible 
arc tested; residues 2^ and 44 are chosen. Suppose 
that we isolate a C? that carries an sbd gene encoding 
the amino acid' sequence Wli-L24-E:25-I30-D34-R42-?44- 
K47, shown in Figure 6c. Consider the reason that D is 
retained at residue 34. We know that all the sequences 
Wll-L24-5:25-I30-x34 -R42-P4 4-K47 (where x runs through 
all twenty amino acids) were tested and therefore can 
conclude with inproved confidence that D34 is optir.al, 
given the rest of the selected sequence. t;ow consider 
the change at residue 24 from F to L. V-'e Vinow that all 
the sequences W 1 1 - x2 4 -E2 5- 1 30-03 4-R4 2 -P-; 4 -K47 vere 
tested and we can conclude that L24 is optimal, given 
the rest of the sequence.. At each of the varied 
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residues, .we gain information about which anino acids ^' 
are optinai at each varied residue under the conditions p 
imposed, £: 

5 In the second version, wc will vary residues 11, ^ 

24 , 30, 34 , 42 , and 47, each through all f-enty amino ^* 
acids, producing 20^^ =» 6 . 4 x 10^ possible different 
sequences. Our hypothesis is chat only 1.0 x 10^ of 
those sequences are produced and subjected to 

10 selection. Because only 15. 6^ of the prograrriaed 
sequences are actually subjected to selection, it is 
likely that the parental sequence, Wl 1-F2 4 -E2 5-C30-D34 - 
E42-P44-T47, is not present in the selection step and 
there is, consequently, no assurance that the best S3D ^ 

15 binds more tightly to target than did the parental PBD. t' 
Suppose that we isolate a CP that carries an sbd gene 
.encoding the amino acid sequence VI 1-R2 4 -t;25-Q30-D3 4 - j'- 
R42-P44-D47, shown in Figure 6d . ' Consider the reason 
that D is retained at residue 34. Is it chat D is '^.^ 

20 optimal, or ' is it that, by chance, the sequence j.- 
encoding the optimal amino acid, x, was not present as 
Vll-R24-£25-Q30-x34-R4 2-P44-D47 in the sample? We do 
not know and therefore can not conclude that D34 is p 
optimal. Furthermore, retaining an amino acid can nor ? 

25 ^ move us toward the optimal sequence. Now consider the \. 
change at residue 24 from F to R, Was V11-R24-l25-Q30- 
D34-R42-P44-D47 selected because R24 is optimal in the 
presence of Vll- -E25-Q30-O34 -R4 2-P44 -D47 , or was Vll- 
R24-E25-Q30-D34-R42-P44-D47 selected because V11-F2';- 

30 E25-Q30-D34-R42- P44 -04 7 was not present to be selected? 

Again, wc do not know and can not conclude that R24 is -y- : 

an improvement, i.e. we can not conclude that R24 is ^ \. 

more likely to be optimal than is K24. In both cases, t.-. 
wc lose information about which amino acids belong at \^~\ 

35 each residue. Wc may have obtained an SPD with " 
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superior oirdincf to the target. Another v^rieqation 
cycle at this level of varieyation, however, oay 
produce a better protein or a worse protein and the 
process is not progressive. 

Let us contrast versions 1 and 2 ot the second 
variegation. In version 1. ve retained, nore 

information, viz. that WU aUo-s inprovcd binding, and 
therefore cur cclecticn of K47 incorporates the 
information obtained in the previous rounds. In 
version 2 of the second variegation, we discarded the 
information that Wll allows stronger binding than YU. 

Prcgressivity is not an all-or-nothing property. 
So long as nost of the information obtained fro.-n 
previous variegation cycles is retained and many 
different surfaces that are related to the PPBO surface 
are produced, the process is progressive. If the level 
of variegation is so' high that the ppbd gene r,ay not be 
detected, the assurance of prcgressivity dini.Tishes. 
If the probability of recovering P?GD is negligible, 
then the probability of progressive behavior is also 
negligible. 

An opposing force in our design considerations is 
that PBHs 2re useful in the population only up to the 
ar.ount that can be detected; any excess above the 
detectable ar.ount is wasted. Thus we produce as many 
surfaces related to PPBD as possible within the 
constraint that the PP3D be detectable. 

we defer specification of exactly how much 
variegation is allowed until we have: a) specified real 
nt distributions for a variegated codcn, and b) 
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examined the effects of d iscrfiponc ics between cpcciticd 
nt distributions and actual nt distributions. 



10 



15 



20 



25 



30 



35 



Sec. I? t- n^^.inn of vaPMA Encodlnrr PBO Fanilv: 

We RUst now decide hov to' distribute the 
variegation within the codons for the residues to be 
varied. These decisions nre influenced by the nature 
of the 'qenetic code. When vgONA is uynt.^.cs i zed , 
variation at the first base of a codon creates a 
population containing amino acids from the sane column 
of the genetic code table (as shown in the Table 3-6 on 
p87 of WATS&7) ; variation at the second base of the 
codon creates a population containing amino acids fron 
the sane row of the genetic code table; v^iriation at 
the third base of the codon creates a population 
containing amino acids from the same box. If two or 
three bases in the sar.e codon are varied, the p-ittern 
is more cor.pl ica tod . ' Work with 3D protein structural 
models may suggest definite sets of amino acids to 
substitute at a given residue, but the method of 
variation, may require either more or fc-'or kinds of 
amino acids be included. For exa»T.ple, examination of a 
model might suggest substitution of N or Q at a given 
residue. Co.-nbinatorial variation of codons roquiros 
'that nixing N and Q at one iccation also include X and 
H as possibilities at the same residue. One mu:;t 
choose to put: 1) U only, 2) Q only, or 3) a mixture c: 
U, K, H, and Q. The present invention does not rely on 
accurate predictions of which amino acids should be 
placed at each residue, rather attention is focused on 
which residues should be varied. 

There are many voys to generate diversity in a 
protein. (See RICHS6, CARU35, and CLIPa6.) One extreme 
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case is th.ic one or a fev residues of the protein are. 
varied as nuch as possible (inter aXjjx see CAKU85, 
CARU87, RICI136, and WHAP,66). We will call this lirait 
"Focused [Mutagenesis". Focused Mutagenesis is 
5 appropriate when the IPSO or other PPBD shows little or 
no binding to the target, as at the beginning of the 
search for a protein to bind to a new target r.aterial. 
When there Is no binding between the PPCD and the 
target, we preferably pick a set of five to seven 
10 residues on the surface and vary each through all 20 
possibilities- 

An alternative plan of nutager.esis ("Diffuse 
' Mutagenesis") that may be useful is to vary ziany more 

15 residues through a more limited set of choices (See 
Vershon et a_U. ChlS of INOU36 and PAKU36). This can 
be accomplished by spiking each of the pure nts 
activated for DN*A synthesis (e.g. nt-phosphoramid i tes ) 
with a small amount of one or core of the other 

20 activated nts. Contrary to general practice, the 
present invention sets the level of spiking so that 
only a small percentage { U to .OOCOn, for example ) 
of the final product will contain the initial DUA 
sequence. This will insure that many single, double, 

25 triple, and higher nutations occur, but that recovery 
of the basic sequencn will be a possible outcome. Let 
Nb be the number of bases to be varied, and let Q be 
the' fraction of all sequences that should have the 
parental sequence, then M, the fraction of the mi>:ture 

30 that is the majority component, is 

M = exp( log^(Ql/Nb I - 10 ( log^o (Q) /-b) . 

If, for example, thirty base pairs on the ONA 
chain were to be varied and 1% of the product is to 
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have the parental sequence, then eac;h mixed nt 
substrate should contain 86^ of the parental nt and IA\ 
of other nts. Table 8 shows the fraction (fn) of DNA 
molecules having n non-parental bases when 30 bases are 
synthesized with reagents that contain fraction M of 
the majority component. When M=. 63096, f24 and higher 
are lossthan 10"^. The entry "most" in Table 8 is the 
number of changes that has the highest probability. 
Note that substantial probability for multiple 
substitutions only occurs if the fraction of parental 
sequence (fO) is allowed to drop to around 10"^. 
Mutagenesis of this sort can be applied to any part of 
the protein at any time, but is most appropriate when 
some binding to the target has been established. The 
ti^ base pairs of the DNA chain tliat are synthesized 
with mixed reagents need not be contiguous. They are 
picked so that between U^^/Z and N^ codons are affected 
to various degrees. The residues picked for mutation 
are picy-.-id wi-:;i reference to the 3D structure of the 
IF3D, if Knovn. For exar.ple, one might pick all or 
.lost of the residues in the pririCipal and secondary 
set. We may i-pose restrictions on the extent of 
variation at each of these residues based on homologous 
sequences or other data. The mixture of non-parental 
nts need not be random, rather mixtures can be biased 
to give particular amino acid types specific 
probabilities of appearance at each cndon. For 
example, one residue nay contain a hydrophobic amino 
acid in all known homologous sequences; in such a case, 
the first and third base of that codon would be varied, 
but the second would be set to T. Other examples of 
how this might be done will be given in the Detailed 
r.xamplc. This diffuse structure-directed mutagenesis 
will reveal the subtle changes possible in protein 
backbone associated with conservative interior changes, 
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sucn as V to' I. as well os i;or.c r.oc go suh:: io clianqcG 

chat require conconlcnnt charqcs at tvo or rr.orc 
residues of the protein. 

For foc'-sed Mutagenesis, ve new consider the 
distribution uf nts that will be insertr.d at each 
variegated codon. Each codon could be progr^rr.ned 
differently. If have no infcr-dtion indicating that 
a particular ar.ino acid or class of -ar.ino acid is 
appropriate, ve strive to substitute ail ^.-mo acids 
with equnl probability because representation of one 
pbd above the detectable Level is wasteful. Equal 
ar.ounts of all four nts at each position in a codon 
yields the anino acid distribution: 



4/6-; A 

2/%^ H 

4/64 P 

1/64 W 



2/64 C 
3/64 I 
2/64 Q 
2/64 Y 



2/64 D 2/64 E 2/64 F 4/64 G 

2/64 K 6/64 L 1/6^ « 2/54 H 

6/64 R 6/64 S 4/64 T 4/6.; V 

:/64 stop 



This distribution has the disadvantaqe of gl-ing two 
basic residues for every acidic rcsid'je. In addition, 
six ti-es as nuch R, S, and L as 'A or M occur. If rive 
codons • are synthesized with this ri i t r i bu t ion , 
25 sequences enccJing five Rs arc 7776-ti::ies r.ore abundant 
,than sequences encodi.ng five Ws. To h.we ;,--v;-w-W-W 
present at dozectablc levels, we r.usc have R-P,-:?-R-R 
present in 7776-fold excess. 

30 considor the distribution of ar^ino acids encoded 

by one codcn in a population of vq:::;A- Lot ;..bun(x) be 
the abundance of D::a sequences coding for a-ino acid x; 
Abun(x) is uniquely defined by tho distribution of nts 
at each base of the codon. For any distribution, there 

35 will be a r.ost-favorod amino acid (r.raa) with abundance 
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Abun(nCoa) and a leas t- favored ar^ino acid (liaa) with 
abundance Abun(lfaa). We seek nhe nt distribution that 
allows ail r.wenty amino acids and that yields the 
largest ratio Abun ( 1 f aa ) /Abun {mf aa ) subject to t'-o 
constraints. First, the abundances of acidic and basic 
amino acidc should be equal lest we bias the PBDa 
toward a particular charge. Second, the nur.ber of stop 
codons should be kept as low as possible, ^ Thus only nt 
distributions that yield Abun(E)-Abun(0) 
Abun (R) t-Abun { K) are considered, and the function 
maximized is: 

{ { l-Abun(stop) ) (Abun(l faa)/AbunCmfaa) ] ) . 

We have simplified the search for an optimal nt 
distribution by limiting the third base to T or G; C or 
G at the third base would be ct^^ i^alent . All amino 
acids are possible and the nun-.scr cf accessible stop 
codons is reduced because TCA Jnd TAA codons are 
el iminatocl . The ar.ino acids F, Y, C, H, \l , I, and D 
require T c\t the third base while m, Q, and K 

require C. Thus we use an equi^.o^^r ziixture of T and C 
at the thi rd base. 

A con-.puter program, written 'is part ot the present 
invention and named "Find Cpti.r.un vgCodon" (See Table 
9), varies the composition at bases 1 and 2, in steps 
of 0.05, and reports the composition th.at gives the 
largest value of the quantity ( f Abun ( 1 f aa ) /Abun (n f -Ta ) 
(l-Abun(stop) ) ) J . A vg codon is symbolically defined 
by the nt distribution at each base: 



; 1 

base t2 



tl 
t2 



cl 
c2 



a I 
a2 



g2 



In. 
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base n - 



t3 



c3 



03 



g3 



tl + cl + 31 + 91=* 1.0 
t2 + c2 + a2 + g2 = 1.0 
t3 = g3 = 0.5, c3 .= a3 



The variation of the quantities tl, cl, al, rji, t2 , c?., 
a2, and q2 is subject to the constraint that 
10 Abun(E) +Abun( 0) equals Abun ( K) -^Atun ( R} ; 

Abun(E) -^Abun (D) = gl*a2 

Abun(K)+Abun(R) = al*a2/2 + cl»g2 + al*g2/2 

IS gl*a2 = al*a2/2 + cl*g2 + al*g2/2 

Solving for g2, we obtain 

g2 = (gl*a2 - .0 . 5*a 1 ^32 ) / ( cl 0.5*al) 

In addition, 

tl = 1 - al - cl - gl 
t2- = 1 - a2 - c2 - g2 
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25 

We vary al, cl, gl, a2< and c2 and then calculcite tl, 
g2,.and t2. Initially, variation is in steps of rA. 
Once an approximately optinun distribution of nts is 
determined, the region is further explored vith steps 
30 of 1%. The logic of this progran is shown in Tabic 9. 
The optinum distribution is: 
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25 



15 



1<5 

Cntirvir. vnCodon 



base n = 
base i2 - 
base n ^ 



0.26 
0.22 
0.5 



0 . 13 
0.16 
0.0 



0.26 
0.40 
0.0 



0. 30 
0.22 
0.5 



and yields Oi:\ r.olccules enccdinq each type anino acid 
with the abundances shown in Table 10. 

The computer that controls a DtJA synthesizer, such 
as the Miliigen 7500, can be progran.T.ed to synthesize 
any base of an oligo-nt with any distribution of nts by 
taking some nt substrates f e.g. nt phocphoranidites ) 
from each of two or more reservoirs. Alternatively, nt 
substrates can be ai>:ed in ar.y ratios a.nd placed in one 
of the extra reservoir for so called "dirty bottle" 
synthesis. Either of these methods ar.cunts to 
specifying the nt distribution. The actual nt 
distribution obtained will differ froii the specified nt 
distribution due to several causes, including: 3) 
difiercntial inherent reactivity of nt substrates, and 
b) differential deterioration of reagents. It is 
possible to compensate partially for these effects, but 
some residual error will occur. We denote the average 
discrepancy betv/een spcciiied and obser/cd nt fraction 

as S„rr. 



^err 



square root ( averager (fobs " 



^ spec 



spec 



] ) 



were f^bs a.r.ount of one type of nt found at a 

base and fspec ar.ount of that type of nt that 

was specified at the sar.e base. The average is over 
all specifieJ types of nts and over a nur.bcr (e^. 10 
or 20) different variegated bases. By hypothesis, the 




actual nc distribution at a variegated base will be 
within S% of the specified distribution. Actual DKA 
synthesizers and DUA synthetic chenistry nay have 
differetit error levels. It is the user's 

responsibility to determine ^err ^^'^ 
synthesiter and chenistry e-ployed by the user. 

To determine the possible efrects of errors in nt 
composition on the amino-acid distribution, we modified 
the program "rind Optinu.-a vgCcdon" in four ways: 

1) thfc fraction of each nt in the first two bases 
is allowed to vary frcn its optinun value tines (1 
- Sq^j.) to the optinu:^ value tir.es (1 ^ ^err'i 
seven equal steps (5^^^- is the hypothetical 
fractional error level entered by the user) ; the 
sun of nt fractions at one base always equals 1.0, 

2) q2 is varied in the sane manner as a2, i.e. we 
dropped the restriction that Ai:un { U) *Abun ( t) = 
Abun(K) +Abun( R) , 

3) t3 and g3 arc varied fron 0.5.ti-es (1 - S^^j-) 
to 0.5 times (1 + Sg^^) in three equal steps, 

4) the snalle st ratio A.bun f 1 faa) /Abun (mf aa) is 
sounht. 

In actual experinents, we will direct the synthesizer 
to produce the optir-un c:.'A distribution "Optinum 
vgCodon" given above. Ir.zcr.plete control over DNA 
chemistry may, hcvevor, cacse us to actually obtain the . 
following distribution that is the worst that can be 
obtained if all r.t fractions are within of the 

amounts cpecified in "Cptinum vgCodon". A 
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corresponding tob.lc can be calculaced for any giver, 
^err "sing the program "Find worse vgCodcn within Serr 
of given distribution." given in Table 11. 
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base 3i = 
base r2 = 
base S3 = 



Optimum vqCodon . worst 5^ orrcrs 



0.251 0.139 0.273 0-287 
0.209 0.160 O.AOO 0.231 
0,475 0.0 0.0 0. 525 



30 



This distribution yields DNA encoding different 
amino acids at the abundances shown in Table 12. 

If five codons arc synthesized with reagents niixed 
so as -to produce the nt-distr ibuticn "Opti-urr. vgCodor.'*. 
and if we actually obtained the nt-distr itutic.-i 
"Cptimun vgCodon, worst 5^ errors", then CNA sequences 
encoding the nfaa at all of the five coders are abc-t 
277 ti-es as likely as DN'A sequences encoding the Ifii 
at all of the five codcns; about 2-;t o:' the D:;a 
sequences- will have a stop ccdon in one or -ore of the 
five codons. 

When five codons are synthesized using euuir.oiar 
mixtures at bases 1 and 2, ( Abun (rnf aa ) /Afcun U f aa ) ) ^ = 
7776: If we program the optir.u:n nt d istr ibut icr. and 
come within 5^, then (Abun (r.f aa) /Abun ( 1 f aa) ) ^ = 2 -7. 
The total number of different FGDs is unchanged, tMZ 
the least-favored .sequence is about 23 tires rcre 
abundant. Detecting the least- favored anino-acid 
sequence when varying four residues with equinolar -ts 
at each varied base requires as sensitive a separation 
system as dees detecting the least- favored amino-acid 
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sequence when voryinq five rcsiducG vich the optinized 
nt distribution. 



By hypothesis, the distribution "Optimal vgCodon" 
5 is used in the second version of the second variegetion 
of hypothetical exonplo 2. The abundance of the DNA 
encoding each type of amino acid is, however, tftKen 
from the Table 12, The /ibundancc of DN'A encoding the 
parental amino acid sequence is: 

10 

Amount (parental 
F24 
= Abun(r) 
= .0249 
15 = 2.4 X 1 



seq. ) 

030 D34 E42 T47 

* Abun(C) * Abun(D) * Abun(E) * AbunfT) 
X .0663 X .0545 x .0602 x .0437 
^-7 



Therefore, 0U^ encoding the PP3L) sequence as well as 
very many related sequences will be present in 
sufficient quantity to be detected and we are assured 
2C that the process will be progressive. 

We use the following procedure to determine 
whether a given level of variegation is practical: 



25 1) fron: a) the intended nt-d is tr i but ion at each 

base of a variegotcd codcn, and b) Sgj-^ {the error 
level in r:ix*ed DNA synthesis), calculate the 
abundances of DNA sequences coding for e^ich amine 
acid and stop. 
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2) calculate the abundance of DI^'A encoding the 
PPBO sequence by multiplying the abundances of the 
parental amino acid at each variegated residue, 



The abundances used in the procedure above are 
calculated from the worst distribution that is within 
Sqj-j. of the specified distribution. A variegation that 
ensures that the PPDD sequence can be recovered is 
practical. PPDD can be recovered if the abundance of 
■ PPBD-encoding DtiA is larger than both l/Mp^v 
^/^sens' * Preferably, the abundance of PPBD-encod inq 
DNA is 3 to 10 tir.es higher than both l/^ntv ^^-^ 
1/Csensi ^° provide a margin of redundancy. ^ritw 
the nurier of trans fornants that can be made frora Yqj^qO 
DNA. With current technology Hntv is approximately 5 x 
10^, but the exact value depends on the details of the 
procedures adapted by the user. Improvements in 
technology that allow nore efficient: a) synthesis of 
D;JA, b) ligation of d::a, or c) transformation of ceils 
will raise the value of -ntv ^sensi 
sensitivity of the affinity separation; i-provements in 
affinity separation will raise Csgpsi* smaller 
of M^rv and Csensi is increased, higher levels of 
variegaticn may be used. For example, if Cggp^^ is 1 
in 10^ and M^^tiv 10^, then ir.provenents in Cg^p^^j^ are 
less valuable than i-provc-r.ents in Mf^^-v. 

A level of variegation that allows recovery of the 
-PPBD has two properties: 

1) we can not regress bcrause the PPBO is 
ava i labie, 

2) an enorrr.ous number of multiple changes related 
to tl^e PPEO are available for selection and we are 
able to detect and benefit from these changes. 

It is very unli>;ely that all of the variants will 
be worse than the PPBO; we roq'jire the presence ot PPBD 
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at detec::ablo levels to incure that all the sequences 
present are indeed reJ.atcd to PPBD. 
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The user must " adjust the list of residues to be 
varied and levels of va r icqa -.ion at each residue until 
the calculated variegation is within the bounds set by 

'^ntv ^sensi* 

Preferably, wg also consider the interactions 
between the sices of variegation and the surrounding 
DNA. If. the ::iethod of nutagenesis to be used is 
replacement of a cassette, we consider whether the 
variegation will generate gratuitous restriction sites 
and whether they seriously interfere with the intended 
introduction of diversity. We reduce or elininace 
gratuitous restriction sites by appropriate choice of 
variegation pattern and silent alteration of codcns 
neighboring- the sites of variegation. See the Detailed 



Sec. 



1 • 1 : 



Insertion of synthetic vaCUk inio — i 



In the case of cassette mutagenesis, the 
restriction sites that were introduced when the gene 
for the inserted dona in was synthesized are used to 
introduce the synthetic vgDNA into a plasr.id or ct!\er 
OCV. Restriction digestions and ligations are 
perfomed by standard -ethods. (AUSU67) . 

In the case of s i ng le-stranded-o 1 igonuc leot ide- 
directed -utagenasis, synthetic vgCNA is us«d to crea-e 
diversity in the vector (DOTS35). 



3 5 sec. 14.2: Trans fo rn a t ■ on of co lls: 





: V. 1 

The present invention is not ii::titcd to any one 
nethod of transforming cells with DNA. The foUouinq 
procedure is a modificacion of that of Mar.iatis (p250, 
5 KANIS2). This procedure ic only one exanple of how the 
neccsGory trans fornat ions . nay be perforr.ert.. The 
procedure produc-s approx' imo tcly (V^/aS) x lo' or core 
transfornants. The user picy.s a v^luo for V^-, the 
initial volur.e of the cell culture, to provide Che 
10 desired nuriber of trans for::ia nts . All water is triple 
distilled and is treated with activated charcoal for 24 
hours . 

1) culture ^ col i in nl of LB broth at 37°C until 
15 cell density reaches 5 x lo'' to 7 x lo'' cells/:nl, 

2) chill on ice tor 65 niinutes, centrifuge the cell 
suspension at 4000q for 5 r.inutes »it 4^0, 

2 0 3) discard supornatar.t ; res u spend the cells in nl 
of an ice-cold, sterile solution oC 60 r^^. CaCl2. 

4) chill on ice for 15 ninutcs, and then centrifuge at 
4000g for 5 ninutes ac A^C, 

25 

-5) discard supernatant; roiiuspcnd cells in 2 >c V^/25 
ml of ice-cold, sterile 60 CaCl^; store cells at 
4°C for froa 10 ninutos to 24 hours; trans forr.at ion 
efficiency increases by about 4-fold in the first 24 
30 hours and then returns to the original value, 

6) add CNA in ligation or TE buffer to V^./250 r.l cf 
cells; nix and store cn ice for 30 minutes. 
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7) heat shock cells at -;2°C Cor an appropriate amount 
of time, ■ 

S) add Vc/25 u\ LD broth and incubate at 37^C for 1 
5 hour, 

9) plate colls on MJ a7ar containing ^^ntibiotic, 



10) harvest CPs in appropriate manner. 

It is not necessary to isolate transforrr.ed cells 
between t rans f ornia t ion and affinity separation. We 
prefer to have trans Corined cells at high concentration 
so that thn.y can be plated densely on relatively few 
plates. For this purpose, steps (9) and (10) nay be 
replaced with a procedure in which the cells in step 
(8) are further diluted with LB broth and the 
selecting antibiotic is added. In the case of 
anpicillin, lysis of sensitive colls occurs, and 
resistant ceils arc enriched by centr i fuga t ion at 2 to 
3 h after addition of antibiotic. 

One routinely obtains between 10' and 5 x 10® 
transfomants/ug of CCC D!:a. Ligation efficiency 
ranges form O.lt for blunt-blunt insertions, to as 
nuch as 15^ for ot ick.y-s t icKy insertions. For large 
transforr.ations, it r.ay be dosirabl.e to purify D:ia 
betw^icn ligation .\nd t ran:: f orna t icn because unlLgated 
ONA is thought to compote with CCC DUA for entry into 
the competent cells. Only a small fraction of cells 
are competent, typically U.l'<. The heat shock has 
been optimized for transformation reactions carried 
out in a volume of 200 ul in a plastic fcTppendcrf tube; 
optimizing this step for larger volumes is possible. 
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,3: Growth oC ^-^e CPfvqPgD^ copulation : 
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Thfe transformed cells arc gro-n first under non- 
selective conditions thoc allow expression of pUsnid 
qenes and then sclocccd to kill "ur.t rans f orr.ed cells.- 
Transformed cells are r.^en induced to express tir.c or^p- 
ph d gene at the appropriate level of induction, as 
determined in Sec. 10.1- The CPs carrying the rPDD 
are harvested by a xechod appropriate to the package. 

A high level of diversity can be generated by in 
vitro variegated synthesis of C.'.'A and this diversity 
can .be naintained passively through several 
generations in an org-inis3 without positive selective 
pressure. . Loss or reduction in frequency of 
deleterious nutations is advantageous for the purposes 
of the present invent ic.--.. As do' not knov how one 
.•night press call or any other kind of coll to 

actively r.aintain diversity, vc- specify that the vg:)NA 
nust be used to prepare piasnids, that the plasmids 
are used to transform; cells, and that the selection 
ziust be performed b^ifore rcre than a ceu generations 
elapse. Moreover. subdividing the variegat»jd 
population before arpl i f ic'it icn in an oroinisni by 
renoving a snail s-i.-ple (Iocs than lOh) for further 
vork would rcf^ult in loss of diversity; therefore, one 
should use all or r.cst of the synthetic 0:JA and r:\ost 
or all of the trans forr.ed cells. 



I so I a t'. on o f CP^ P n D)s with bindi na-to- 



target phenocypes : 
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The .har-.'fcstcd pacXaqeii are r.ov cnricr.cd for ttic 
binding-co-carQCC phenotypo by use of affinity 
separation invoLvir.q the target r.aterial ir-obil ized 
on an affinity matrix. Packa<ies Chat fail to bind to 
the target material are vashcd avay. If the packages 
are bactericphaqc or cndospores, it -ay bo desirable 
to include a bacteriocidal otjcnt, such- as azide, in 
the buffer to prevent b.^ctcri.il gro'^th. The buffers 
used in cnro.-dccg rapr.y -ust include: a) any ions or 
other solutes needed to stabilize the target, And b) 
any ions or other solutes needed to stabilize the PDDs 
derived froc the IPCD. 

Sec. 15. T - LrrA^h\nn r^P rnrnr,r -aterlal to a colunn: 

Affinity colur:n chronatography is the preferred 
ceth.od of affinity separation, but other a_ffinity 
separation -ethods r,ay be used. A variety of 
ccnnercially available support materials f^r affinity 
chroaatoqrophv are used. These include derivatized 
beads to vnich the target r.aterial is covalently 
linked, or ron-der Ivat ized material to vhich the 
target material adheres irreversibly. 

Suppliers of support r.aterial for affinity 
chromatography ir.clude: Applied Protein Tecnnolcgiof; 
Canbridge, yj^: Bio-Pad Libera to r ios , Rockvillc Center, 
UY; Pierce Cher, ical Ccrpany, Rjckford, IL. Target 
rr.aterials are attached to the natrix in accord with 
the directions of the r.anu facturer of each natrix 
preparation with consideration of good presentation of 
the target. 



Sec, 15.2: 

binding: 



P-jd'jc im g^Iocticn d:je to :io n-soec i f ic 
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We reduce non-spfecific binding of CP(PBD)£ to the 
matrix that bears the target in tvo ways: 



1) we treat the colunn with blocking agents such 
as genetically defective CPs or a solution of 
protein before the populacion of G?(vgpBD)s is 
chronatographed , and 

2) we pass the population of GP(vgPBD)s over a 
matrix containing no target or a different target 
from the sar.e class as the actual target prior to 
affinity chrona tography . 

Step (1) above saturates any non-specific binding that 
the affinity rzatrix night show toward wild-typo CPs or 
proteins in general; step (2) rer.oves components of 
our population that exhibit non-specific binding to 
the matrix or to polecules of the sane class as the 
target. I. the target were horse heart myoglobin, for 
example, a colunn supporting bovine serun aIbc.T.in 
could be used to trap CPs exhibiting PBOs with strong 
non-specific binding to proteins. If cholesterol were 
the target, then a hydrophobic conipound, such as p- 
, tert iarybutylbenzyl alcohol, could be used to' rc.-ove 
CPs displaying PBDs having strong non-specific binding 
to hydrophobic compounds. It Is anticipated that PBOs 
that fail to fold or that are prematurely teminated 
will be non-s?ecif icaUy sticky. These sequences 
could outnu.T.ber the FDDs having desirable binding 
properties. Thus, the capacity of the initial colu.nn 
that removes tndiccr ir.inately adhesive PBDs should be 
greater f o.q. 5 fold greater) than the coiuzn nhat 
supports the target .-olcculc. 
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Variation in the support material (polystyrene, 
glass, agarose, cellulose, etc . 1 in analysis of clone:; 
carrying GUDs is used to eliminate enrichment for 
packages that bind to the support material rather than 
the' target . 



Sec. 15. 3 : 



Klutina thf? colurin: 



To ■ separate the C?(P30)s that carry PBDs that 
show actual binding to the target from CP(PBD)s that 
carry PBDs that do not actually show binding to the 
target, the population of CPs is applied to an 
affinity matrix under conditions conpatible with the 
intended use of the binding protein and the population 
is fractionated by passage of a gradient of sone 
solute over the column. The process enriches for PBOs 
having affinity for the target and for which the 
affinity for the target is least affected by the 
eluants used. The enriched fractions are those 
containing viable CPs that elutc from the column at 
greater concentration of the eluant. 

Any ions or cofactors needed for stability cf 
PBDs (derived fron IPBD) or target must be included in 
initial and elution buffers at appropriate levels. We 
first remove C?(PBD)s that do not bind the target by 
washing the natrix with the volume of the initial 
buffer required to bring the optical density (at 26C 
nm or 280 nn) back to base line plus one void. volume 
(Vy) , but not more than 5 v^. The column is then 
eluted with a gradient of Increasing: a) salt, b) 
(decreasing pH) , c) neutral solutes, d) temperature 
(increasing or decreasing), or e) some combination of 
these factors. The solutes in each of the first three 
gradients have been found generally to weaken non- 
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cov'\lcnt inCer.-*.cc ions betveen proteins and bound 

molecules. Sale is the nost preferred soluce for 
gradient fom.ition in nost cases. Other solutes that 

generally we.iken non-covalent interaction betveen 

proteir.s and bound r.olecules nay also be used. "Salt" 

includes solutions containing any or all of the 
folloving ionic species: 



10 


Na-r 


K- 


Ca-^ + 


Mg-^ 






Li* 


Sr+ + 


Ba^- + 






Cs + 


Cl- 


Br- 


15 












SO4 — 


HSO;- 


PO4 


HPO4-- 






COj-- 


HCO3- 


Acotace 
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Citrate 


Standard 1- 


Standa rd 


C u a n id in ium 






A.-zino Acids 


nucleotides 


Ci 



ether ionic or neutral solutes may be used. All 
25 solutes are s--bject to the necessity that they not 
kill the genetic packages. Because bacteria continue 
tc r.ecabolitc d-jring affinity separation, the choice 
of buffer co-ponents is r.ore restricted for bacteria 
than for bactariophace or spores. Neutral solutes, 
30 such as ethar.ol, acctcne, ether, or ;irea, are 
frequently used in protein purification and are known 
to weaken non-covalent interactions between ' prote ins 
and other iLcleculos. Kany of these species are, 
however, very- harr.ful tc bacteria and bacteriophage. 
Bacterial spores, on the other hand, are inper%'icus to 
most neutral sclutes. Several passes nay be nade 
through the steps in Sec. 15. Different solutes r.ay 
be used in difforent analyses, salt in one, pH in the 
nexrt, etc. 
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genetic material is essential. If cells, spores, or 
virions bind irreversibly to the nacrix but are noc 
killed, we can recover the information through .i_n s itu 
cell division, germination, or infection respectively. 
Proteolytic degradation of the packages and recovery 
of DtiA is not prefcrred- 

Although degradation of the bound .CPs and 
recovery of genetic material is a possible node of 
operation, inadvertent inactivation of the CPs is very 
deleterious. It is preferred that maxirauia limits for 
solutes that do not inactivate the CPs or denature the 
target or the column are determinad. If the affinity 
matrices are expendable, one may use conditions that 
denature the column to elute CPs; before" the target is 
denatured, a portion of the affinity matrix should be 
removed for possible use as an inoculum. As the CPs 
are held together by protein-protein interactions and 
other non-covalent molecular interactions, there will 
be cases in which the molecular package will bind so 
tightly to the target molecules on the affinity matrix 
that the CPs can not bo washed off in viable forn. 
This will .only occur when very tight binding has been 
obtained. In these cases, methods (3) through (5) 
above can be used to obtain the bound packages or the 
genetic messages from the affinity matrix. 

It is possible, by manipulation of the elution 
conditions, to isolate iIDDs that bind to the target at 
one pH (pHt,) but not at another pH (P^q) • The 
population is applied at pHj-^ and the colu-n is washed 
thoroughly at pfit,* The column is then clutcd with 
buffer at pHq and CPs that come off at the new pH are 
collected and cultured. Similar procedures may be 
used for other solution parameters. such as 
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temperature. For example, CP(vgpOD)s could bo applied 
to a column supporting insulin. After eluting with 
salt to remove CPs with little or no binding to 
insulin, we elute with salt and glucose to liberate 
CPs that display P£Ds that bind insulin or glucose in 
a competitive manner. 

Sec. 15.5: Anolifvinn the rnric.^ed Packages 

Viable CPs having the selected binding trait are 
amplified by culture in a suitable medium, or, in the 
case of phage, infection into a host so cultiv^ited. 
If the CPs have been inactivated by the 
chromatography, the OCV carrying the osp - pbd gene must 
be recovered from the CP, and introduced into a new, 
viable host. 

Sec- 15.6: Det err. i n i r.-r vhe'.her furtr^^r enrichment is 

needed: 



The probability of isolating a G? with improved 
binding increases by Cgff with each separation cycle. 
Let N be the number of distinct amino-acid sequences 
produced by the variegation. Wq wont to perform K 
25 -separation cycles before octempting to isolate an SBD, 
where K is such that the probability of isolating a 
single sao is 0.10 or higher. 



the 



ill( 



st integor>= logio(0-10 W) / log (Cq f f ) 



35 



For exanple, if ri were 1.0 x 10*^ and C^ff = 
6.31 X lOj, then loglO(1.0 x 106) / log 10 (6 . 3 1 x 102) ^ 
6.0000/2. 8000 - 2.1-;. Therefore we would attiempc to 
isolate SHDs after the third separation cycle. After 
only two separation cycles, the probability of finding 
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an SQD is (6.31 x . 1 0 - ) ( 1 • ^ ^ ^0^) .0-1 ond 

attempcing to isolate SBDs night be profitable. 

Clonal isolates from the last Craction eluted in 
5 Sec. 15.3 containing any viable CPs, as well as clonal 
isolates obtained by culturing an inbculun taken fron 
the a C £ inity ' matrix, are cultured in a growth step 
that is similar to that described in Sec. 14.3. If K 
sepiiraticn cycles have been ccr.plcted, sample's fron a 

10 number, e.g. 32, of" these clonal isolates are tested 
for elution properties on the (target) column. If 
none of the isolated, genetically pure CPs show 
iir.proved binding to target, or if K cycles have not 
yet been completed, then we pool and culture, in. a 

15 manner sinilar to the manner set forth in SEc. 14.3, 
the CPs froni the last few fractions eluted (see Sec. 
15.4) that contained viable CPs and fron the CPs 
obtained by culturing an incculun taken from the 
column ::atrix. We then repeat the enr.ichnient 

20 procedure described in Sec. IS. This cyclic 
enrichment nay continue N^-^^-^f^ passes or until an S30 
is isolated. 

If one or more of the isolated CPs has improved 
25 retention on the i target) colur.n, we deterninc whether 
- the retention of the candidate iBDs is due to affinity 
for. the target materiel as follows. A second column 
is prepared using a different support matrix with the 
target material bound at the optimal- density. The 
30 elution volumes, under the' same elution conditions as 
used previously (see Sec. 15.3), of candidate CP(SOO)s 
are co.-npared to each other and. to CP(PP3D of this* 
round). If one or more candidate CP(SBO)s has a 
larger elution volu.T-e than CP(PPBD of this round) , 
35 then we pick the CP(SDD) having the highest elution 



volur.c and proceed Co characterize tne popuLdtion (see 
Sec. 15.7). If none of the candidate CP(35D)s has 
higher eiution volune than CP(?P30 of this round), 
then ve pool and culture, in a z-anner similar to the 
nanner used previously (Sec. 15.3). the CPs frcn the 
last fcv fractions that contained viable CPs and the 
CPs obtained by culturing an inoculu.-;! taken fron the 
coiunn natrix'. v,"e then repeat t.^.c enrichnent 
procedure of Sec. 15. 

If all of the S3Ds shov binding that is superior 
to PPSD of this round, ve pool and culture the CPs 
from the last fraction that contains viable CPs and 
fron the inoculu<-n taricn f ron the colucn. This 
population is ro-chrcr-ataqraphcd at least one pass to 
fractionate further the CPs based cn . 

If an R:;a phaf^e '-ere used as CP, the FJJA vculd 
either be cu i tured " v i th the assistance of a helper 
phaqe or be reverse tr.'.nscr : bed ar.ri the o:ja ar.pl ificd. 
The anplifiod C:.*A could thr?.-. be sequer.c^a cr subclcncd 
into suitable plasnids. 

Sec. 15.7: Chi*, r-^ic to r i .t • nn the Per u la t ion: 

We characterize r.o.-bcrs of the pcpulaticn shcvir^g 
desired binding properties oy genetic dnd biocher. ical 
nethods. '.Ve obtain clcnal isolates and test these 
strains by genetic and affinity methods to deterrnine 
Qonotype and phcnotype vith respect to binding to 
target. For several genetically pure isolates that 
show binding. ve de.-onstrate that the binding is 
caused by the artificial chi.Toric gene by excising the 
os n-shd gcno. and crossing it into the parental CP. Wc 
also ligatc the deleted b,-:cV:bono of each CP fro.-?, which 
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the o?p-s^;d is rer.oved and demonstrate that each 
bac)cl3one alone cannot confer binding to the target on 
the CP. We sequence the osp-sbd gene fron several 
clonal isolates. Prir:ers Cor sequencing are chosen 
from the DNA flanking the o sn-orbd gene or fron parts 
of the osn-pDcd gene that arc not variegated. 

Sec. 15.8: Testing of binding affinity: 




10- For one or moro clcral isolates, we subclone the 

3bd gene fragr.ent, without the osp fragment, into an 
expression vector such that each SBD can be produced 
as a free protein. Because numerous unique 

restriction sites were built into the inserted donain, 

15 it is easy' to subclone the gene at any tire. Each SBD 
protein is purified by normal neans, including 
affinity chromatography. Physical measurements of the 
strength of binding are then made on each free SBD 
protein by one of the following aethods: 1) alteration 

20 of the Scopes radius as a function of binding of the 
target material, measured by characteristics of 
elution from a molecular sizing column such as 
agarose, 2) retention of radiolabeled binding protein 
on a spun affinity coluri.n to which has been affixed 

25 ,the target material, or 3) retention of radiolabeled 
target material on a spun affinity column to which has 
been affixed the binding protein. The measurements of 
binding ■ for each free SBO * are compared to the 
corresponding r.easuremcnts of binding for thr. PPSO. 

30 

In each assay, we measure the extent of binding 
as a function of concentration of each protein, and 
other relevant physical and chemical param.etsrs such 
as salt concentration, temperature, pH, and prosthetic 
group concentrations (if any). 




© 0 
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In addition, the SBD with highest affinity for 
the target from each round is compared to the best SBD 
of the previous round (IPSO for the first round) and 
to the IPBD (second and later rounds) with respect to 
affinity for the target natcrial. Successive rounds 
of mutagenesis and select ion-chrcugh-b'ind ing yield 
increasing affinity until dosirc;d levels are achieved. 

If we find that the binding is not yet 
sufficient, we nust decide which residues to vary next 
(see Sec. 16.0). If the binding is sufficient, then 
we now have a oxpccssicn vector bearing a gene 
encoding the desired novel binding protein. 

Sec. 15.9: Otlier Affinity Soo^r>-ition Mc^ns: 

TACs .may be used to separate CPs that bind 
fluorescent labeled target wiih the cptinized 
para-Tieters deternincd in Part II. We discriminate 
against art i factual binding to the 'luorescenr able by 
using two or more differerrt dyes, chosen to be 
structurally different. CPs isolated using target 
labeled with a first dye are cultured. These CPs are 
then tested with target labeled wich a second dye. 

. E 1 ec t ropho rc t i c affinity separation uses 
unaltered target so that only other ions in the buffer 
can give rise to artifactual binding. Artifactual 
biding to the gel material gives rise to retardation 
indopondGnt of field direction and so is easily 
eliminated. A variegated population of CPs will have 
a variety of charges. The following 2D 

electrophoro.t ic procedure accor.nioda tes this variation 
in the population. 



First the variegated population of CPs is 
elcctrophorcscd in a gel that contains no target 
material. The elect rophoros is continues until the CP 
5 s are dictributod along the length of the lane. The 
gels described ly Sewer for phage are very low in 
agarose and lack ncchanical stability. The target- 
free lane in which the initial electrophoresis is 
conducted is separate frcn a square of gel that 

10 contains target material by a removable baffle. After 
the first pass, the baffle is removed and a second 
electrophoresis is conducted at right angles Co Che 
first. CPs that do not bind target migrate vith 
unaltered mobility while CP s that do bind target will 

15 separate from the majority that do not bind target. A 
diagonal line of non-binding CPs will form. This line 
is excised and discarded. Other parts of the gel are 
dissolved and the CPs cultured. 



2 0 Sec. 16. Q: Tnc f.>:<t V-i r 1 oa.:i t i on Cyc le: 



We now consider which residues of the PDD should 
be varied in tho. next variegation cycle. The general 
rule is to preserve as much accumulated inf on.'.ac ion as 

25 ^possible. If the level of variegation in the 'jrevious 
variegation cycle was correctly chosen, then the amino 
acids selected to bo in the residues just varied are 
the ones best determined. The environment of other 
residues h.Ts changed, so that it is appropriate to 

30 vary them again. Because there are always more 
residues in the principal (Sec. 13.1.1) and secondary 
sets (Sec. 13.1.2) than can be varied simultaneously, 
we start by picking residues that either have never 
been varied (highest priority) or that have not been 

35 varied for one or more cycles. If we find chac 
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varying olL the residues except those varied in the 
previous cycle does not ailov a hiqh enough level of 
diversity, then residues varied in the previous cycle 
might be varied again. For cxacple, if H^tv i^^^ 
nurJaer of indej-endcnt transf ormancs that can bo 



produced fron Y[; 



of D:iA) and 



(the 



sensitivity of the affinity separation) were such that 
seven residues could be varied, and if the principal 
and secondary sets contained 13 residues, we would 
always vary seven residues, even though that ir.plics 
varying some residue twice in a row. In such cases, 
we would pick the residues just varied that contain 
the amino acids of highest abundance in the variegated 
codons used. 

It is the accu.T.ulaticn of information that allows 
the process to select those protein sequences that 
produce binding between the SDD and the target- Some 
interfaces Lctwocn proteins ana ot.her ."nolecules 
involve twenty or nore residues. Ccr.plete variation 
of twenty rcsidu-js would generate 10^^ different 
proteins. Dy dividing the residues that lie close 
together in space into overlapping groups of five to 
seven residues, we can v£jry a large surface but never 
need to test rore than 10^ to 10^ candidates at once. 



savings of 10 



10 



to 10 



17 



fold. 



The power of 



selection with accu-ulation of in f orr.a t ion is well 
illustrated in Chapter 3 of DAWK86. 

Having picked the residues to vary, we again set 
the ranqe of variegation for each residue according to 
the principles set forth in 13.2, design the vgDHA 
encoding the desired r.utants (Sec. 13.3), clone the 
vgDI/A into CPs (Sec. K), and sclect-by-bi nding-to- 
target those CPs bearing SEDs (Sec. 15) . 
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S^.C. 17.0: OTHER CQfrS I DFPAT ICrfo : 



17.1: Joint selections: 



One may modify the affinity separation of the 
method described to select a molecule that binds to 
material A but not to material B. One needs to 
prepare' tuo selection columns, one with material A and 
the other with material B. The popuidtion of genetic 
packages is prepared in the manner described, but 
before applying the population to A, one passes the 
population over the B column so as to remove those 
members of the population that have high affinity for 
B ("reverse affinity chromatography"). In the 

preceding specification, the initial column supported 
some other molecule simply to remove CP(?SD)s that 
di.Tplayed . PBDs having indiscriminate affinity for 
surfaces. 

It may be necessary to amplify the population 
that does not bind to B before passing it over A. 
Amplification would most likely be needed if A and S 
were . in some ways similar and the PPBD" has been 
selected for having affinity for A. The optimum order 
of interactions night be determined empirically. 

For example, to obtain an SBU tliat binds A but 
not B, three columns could be connected in series: a) 
a column supporting some compound, neither A nor B, or 
only the matrix material, b) a colunn supporting 3, 
and c) a column supporting A. A population of 
CP(vgPBO)s is applied to the series of columns and the 
columns are washed with the buffer of constant ionic 
strength that is used in the application. The columns 
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are unco;:plcd, and the 'third colur.n is eluted vith a 
gradieriC to isolate CP(-PBDJs that bind A but not B. 

• Cni can *.lso generate molecules that bind to both 
A and B. In this case we can use a 3D model and 
mutate one face of the molecule in question to get 
binding to A. One can then mutate a different face 
to prccuce binding to B. When an SQD binds at least 
sor.cwn-^t to both A and B, one can mutate the chain by 
Diffuse Mutagenesis to refine the binding and use a 
sequential joint selection for binding to both A and 
B. 

The materials A and B could be proteins that 
differ '3t only one or a fov residues. For example, A 
could be a natural protein for which the gene has been 
cloned and B could be a mutant of A that retains the 
overall 3D structure of A. SBOs selected to bind A 
but not 3 nust bind to A near the residues that are 
.nutated in D. If the mutations were picked to be in 
the active site of A (assuming A has an active site), 
then an S3D that binds A but not D will bind to the 
active site of A and is likely to be an inhibitor of 
A. 

To obtain n protein that will bind to both A and 
B, we can, alternatively, first obtain ^n SBD that 
binds A and a different SBD that binds B. We can then 
combine the genes encoding these doma.'ns so that a 
two-domain . s ing 1 e-po lypcpt ide protein is produced. 
The fusion protein will have affinity for both A and B 
because one of its domains binds A and the other binds 
B. 
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• One can aiiio »^cnocatii cindinq procain:; '-ith 
affinity for both A and 0, such thac these (r.atcriaii; 
will compete tor the. sarze site on the binding protein. 
We guarantee cor.potition by overlapping^ the sites for 
5 A and B. Using the procedures of the present 
invention, we first create a r.oleculo that binds to 
target material A. We then vary a set oC residues 
defined as: a) those residues that vere varied to 
obtain bindinq to A, plus b) t!'.ose res idues . c lose in 

10 30 space to the residues of set (a)* but that are 
internal and so arc unlikely to bind directly to 
either A or D. Residues in set (b) are likely to .T.ake 
small changes in the ppsitioning of the residues in 
set (a) such that the affinities for A ant-. B uiH be 

15 changed by srnall anounts. Members of these 
populations are selected for affinity to both A and 3. 

Sec. 11.2: Selection for non-bindiri a: 

20 The method of the present invention can be used 

to selrict proteins that do rot bind to selected 
targets. Consider a protein of pha rnaco loc; ic^i 1 
i.-nportance, such as streptokinase, that is .ir.tincnic 
to an . undesirable extent. we can tahc cho 

25 pha rsiaco log ica ily iniportant protein as IrBn and 
antibodies against it nr. target. Residues on the 
surface of the pha rr.aco leg ica I ly ir.portant protein 
would be variegated and CP(POD):i that do not bind to 
an antibody colu.-r.n would be ccilected and cultured. 

30 Surface residues nay bo idcntitieJ in scver.il vays, 
including: a) from a 3D structure. b) fron 
hyd rophob ic i ty considerations, or c) chorr.ical 
labeling. The 3D structure of the pha rrr.acolog ica I ly 
important protein regains the preferred guide to 

35 picking residues to vary, evccpt now we pick residues 
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that are widely spaced so that we leave as little as 
possible of the original surface unaltered. 

Destroying binding frequently requires only that 
5 a single anino acid in the binding interface be 
changed. If polyclonal antibodies are used, we face 
the problem that all or most of the strong epitopes 
nust be altered in a single noLecule. Preferably, one 
would have a set of nonoclcnal antibodies, or a narrow 

10 range of antibody species. If we had a series of 
monoclonal antibody columns, we could obtain one or 
more mutations that abolish binding to each monoclonal 
antibody. We could then combine some or all of these 
mutations in one molecule to produce a 

15 pharmacoiogically important protein recognized by none 
of the monoclonal antibodies. Such mutants must be 
tested to verify that the pharmacologically 
interesting properties have not be altered to an 
unacceptable degree by the mutations. 

20 

Typically, polyclonal antibodies display a range 
of binding constants for c-.ticen. Even if we have 
only polyclonal antibodies that bind to the 
pharmacologically i.T.portant protein, we may proceed as 

25 follows. We engineer the pharmacologically important 
protein to appear on the surface of a replLC,'\ble G?. 
-We introduce mutations into residues that are on the 
surface of the pharmacologically important protein or 
into residues thought to be on the surface of the 

30 pharmacologically important protein so that a 
population of CPs is obtained. Polyclonal antibodies 
are attached to a column and the population of CPs is 
applied to the column at low salt. The column is 
clutcd with a i^ait gradient. The CPs that eluce at 

35 the lowest concentration of salt are those which bear 
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pharmacoioqicaily ir.porcant proceins that have been 
mutated in a wciy chat cliainates binding to the 
antibodies having raaxinun affinity for the 
phannacolog ically important protein. The CPs eluting 
at the lovest sait are isolated and cultured. The 
isolated SDD becocr.es the PPBD to further rounds of 
variegation so that the antigenic detcrri i nants are 
successively eliminated. * 



Sec 



L7. 3: 



Selection cf PBDs for retention of 



s t ructu re : 

Let us take an SBD with known affinity for a 
target as PP3D to a variegation of a region of the PDO 
that is far fron the residues that were varied to 
create the SBD. We can use the target as an affinity 
rr.oltcule to select the PBOs that retain binding for 
the target, and that prRSunably retain the underlying 
structure of the IPBD. The variegations in this case 
could include insertions and deletions that are Likely 
to disrupt the IPBD structure. we could also use the 
IPBD and AfM(I?aD) in the sane way. 

For example, if IPBD were 3?TI and AfM(B?T:) wore 
trypsin, we could introduce four or five additional 
recidue after residue 26 and select CPs that display 
' PDDs having specific ;-\ffinity for AfMCBPTt), Residue 
26 is chosen because it is in a turn and because it is 
about 2 5 A from K15, a key ar.ino acid in binding to 
trypsin . 

The underlying litructure is nost likely to be 
retained if insertions or deletions arc nade at loops 
or turns. 
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^rod t', inJinq p rote ins not un '^nx 



_que : 



For each t^^r^et, there are a large number of SBDs 
that may be found by the method of the present 
5 invention. The process relies on. 'a corr.binat ion of 
protein structural considerations, probabilities, and 
targeted nutacions with accunulation of inCoVmation. 
To increase the probability that some POD in the 
population vill bind to the target, ug generate cis 
10 large a population as we can conveniently subject to 
solection-through-binding in one experinent. Key 
questions in reanagcment of the nethod are "How many 
transfomants can we produce?", and "How small a 
component can we find through select ion-through- 
15 binding?'*. Geneticists routinely tind mutations with 
frequencies of one in 10^° using sirr.ple, powerful 
selections; wc e x pe r i n e n t a 1 y determine the 
sensitivity of cur procedure. The optimum level of 
variegation is detemincd by the maximum number of 
20 trans forr.an cs and tho selection sensitivity, so that 
*'or any red^.mable sensitivity we nay use a 
progressive process to obtain a series of proteins 
with higher and higher affinity for the chosen target 
material. Enr ichr.-.enrs of 1000-fold by a single pass 
25 of elution fron an affinity plate have been 
demonstrated (SMIT3S) . Throe rounds of such 

enrichment could produce 10^-fold enrichment, and 
additional rounds may be added if necessary. 

^0 Use of different variation schemes can y'.eid 

different binding proteins. For any given target, 
there is a larq^* plurility of proteins that will bind 
to it. Thus, if cr.o binding piotein turns out to be 
unsuitable for some reason ( e.g. too antigenic), the 

35 proccdurf^ can be repeated with different variation 
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parameters. For exor.ple, one might cljoose different (. .-j 

residues to vary or picX. a different nt distribution f 
at variegated ccdons so that a new distribution of 
amino acids is tested at the sane residues. Even if 
the same principal sot of residues is used, one mi^ht 
obtain a different SBD if the order in which one picks 
subsets to be varied is altered. 



Sec. 17.5: Othnr ?odos of nutaoenesis possible: fj 
10 The modes of creating diversity in the population 

of GPs discussed herein are not the only modes 

possible. Any rr.ec.hod of mutagenesis that preserves at ir':-^ 
least a large fraction of the information obtained 
from one selection and then introduces other mutations 
15 in the same domain vill vork. The limiting factors [ 
are the number of independent trans formants that car. 

ir-: 

be ■ produced and the ar.ount of enrichment one can f- 
achieve through affinity separation. Therefore the S' 
preferred embodiment uses a method of mutagenesis that i^'. 
20 focuses mutations into those residues that are most c.- 
likely to affect the binding properties of the PBO and r- 
are least likely to destroy the undorlying structure 
of the IPBD. ?v 

Ir- 

25 Other modes o: mutagenesis might allow other GPs 

to be considered. For example, the bacteriophage f . 

lambda is not a useful cloning vehicle for cassette ( 

mutagenesis because of the plethora of restriction ' T 

sites. One can, however, use s ingl e-stra nded-ol igo- r 

30 nt-directod mutagenesis on lambda without the need for " 

i 

unique restriction sites. Uo one has used -jingle- J*, 

strandcd-ol igo-n t -d i rc-cted mutagenesis to introduce \ '. 

the high level cf diversity callod for in the present' |- ■ 
invention, but if it is possible, such a method would 
35 allow use of phage with large genomes. 
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BPTI-Derivgd Binding Protein for HHMh : Di«:pTp.Yr.H hy mi-. 
Phage 

Presented beiou is a hypothotical example of a 
protocol for developing a ncv bindinq riolecuie derived 
from BPTI with affinity for horse h«art *-nyoglobin 
(HHMb) using the conrr.cn ^ co 1 1 bacteriophage M13 as 

genetic package. It vlLl be understood that soir.e 
further optimization, in accordance with the teachings 

herein, may be necessary to obtain the desired results. 

Possible n:od i f Icat ions in the preferred method arc 

discussed ini.-^edia tely following various steps of the 

hypothetical exa.-nple. 



By hypotr.es is, 
capabil ities: 



we set the following technical 
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500 ■ ng/synthesis of ssDNA 100 basec 
long , 

10 ug/synthosis of ssD:;a 60 bases long, 
1 ng/synthocis o: ssDNA 20 bases Icnq. 

ino baces 

1 mg/i 

0.1 'a for blunt-blunt, 

4 1; for St ic!-:y-blunt . 

11 •< for sticky-sticky. 

5 X 10" 
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Cg<-f 900-fold enrichment 

C^sensi 1 in 4 X 10^ 
^chrom 10 passes 
Serr 0-05 



10 Cvr^role I. P^^rt I 

In this example, we will use M13 as a replicablc 
GP and BPTI as IPEO, The considerations that lead to 
these choices are discussed. In P^rt r, we are 
15 concerned only with getting BPTI displayed on the outer 
surface of an H13 derivative. Variable DNA may be 
introduced in the o s?- Ipbd gene, but not within the 
region that codes .for the trypsin-binding region of 
BPTI. Once BPTI is displayed on the M13 outer surr'ace 
20 of an M13 derivative, we proceed to Part II to optinize 
the affinity separation prcccdures. 

We consider various CPs and, for this exaMplo, 
choose a filamentous bacter iopnagc of col i . M13. We 
prefer phage over vegetative bacterial cells because 
phage are nuch less ne tabol ica 1 ly active. We prefer 
phage o»/er spores because the molecular mechanisms of 
th'i virion fonr.ation and 3D structure of the virion are 
much better understood than are the corresponding 
processes of spore formation and structures of spores. 

H13 13 a very well studied bdcter iophagc, widely 
used for D;iA sequencing and as a genetic vector; it is 
a typical r.c.Tiber of the class of filamentous phages. 
The relevant factz about Ml 3 a.nd other phages that will 
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allow us to clioose anong phages are cited in Sec. 
1.3.1. 

Compared to other bacteriophage, filamentous phage 
in general are attractive and H13 in particular is 
especially attractive because: 

1) the 3D structure of the virion is knoun, 

2} r.he processing of the coat protein is veil 
'.•rderstood , 



3) the qenc 



is expandable. 



:) the genome is snail. 



5) the sequence of the gcnor.e is known. 

6) the virion is physically resistant to shear, 

20 heat, cold, guan id in iu.-n Cl, low pK, and high salt, 

7) the phage is a sequencing vector so that 
sequencing is especially easy, and 



25 8) antibiotic-resistance genes have been cloned 

into the gcncr.e with predictable results (HINESO) . 

Other criteria listed in Sec. 1.0 and 1.3 of the are 
also satisfied: M13 is easily cultured and stored 

30 (FRIT35), each infected cell yielding 100 to lOCO M13 
progeny after infection. M13 has no unusual or 
expensive media vequire-cnts and is easily harvested 
and concentrated (SALI64, FP.IT85) . fU3 is stable 
toward physical agents: tenperature (10\ or* phage 

35 survive 30 minutes at 85^C) , shear (Waring blender does 
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net ki-.), d«=i==acion (noC applicablO, radiation (not 
applicable), aqe (stable for years). 

H13 is stable toward chemicals: pH (< 2.2 
S (SMIT33,). surface active a.ents: not ^PP^^^^J- 
cnactropes (guanidiniu. MCI = 6.0 M, . .ons = 
sensitivities). organic solvents (ether and oh. 
organic solvents are lethal (KARV7S)), proteases (not 
applicable. H..lb not a protease). MU is not known to 
10 be sensitive to other enzyr.es. 

Mn genoce is 642: b.p. and the sequence is known 
(SCHA73). Because the geno.e is s»all. cassette 
mutagenesis is practical on RP K13 (--^^>' " 
15 singlc-strondod oliqo-nt directed mutageneses (FR.-a - 
„13 -s a plas:.id and transfor^.ation system in itself, 
.nd an ideal sequencing vector. M13 can be grown on 
Rec- strains of coU- The K13 genome is expandable 
(HESS73, KRIT35). 3 confers no advantage. but 

20 doesn't lyse cells. The sequence of gene VIII xs 
^nown. and the ar.ino acid sequence can be encoded on a 
• synthetic gene, using 1^ pror^oter and used in 
conjunction with the LacI^ repressor. The iacL^ 
p-o,oter is induced by IPTC. Gene VIII protein is 
25 secreted by a well studied process and is cleaved 
- between A23 and A2.. Residues 18. 21. 22. and 23 o 
gen» VIII protein control cle.v-.ge. Mature gene .1.1 
protein ..ahes up the sheath around the circular ssDNA. 
The 3D structure of fl virion is known at .ediu., 
30 resolution; the anino tor-nus of gene VIII P""- - 
cn surt..cc of the virion. No fusions to «13 gene VIII 
protein have been reported. The 20 structure of MU 
coat orotein is implicit i.n the 30 structure. .Mature 
M13 gene VIII protein has only one domain. There are 
35 four minor proteins: gene III. VI, VII, and IX. .ach 
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of these ninor proteins is present in about 5 copies 

per virion and is related to morphogenesis or i- 

infection. The major coat protein is present in more \ 

than 2500 copies per virion. I 

' I 

Although no fusions ot" Mi: gene VTII to ether f 

genes have been rcportci, ;*:now ledge of che virion 30 ^ 

structure p.dkcs attochcicrc of IPDD to the amino J*^ 

?i 

terminus of ^nature Hi3 coot protein (M13 C?) cfuite 
10 attractive (See Sec. 1.3.2). Should direct fusion of 

BPTI to M13 CP fail to causf* BFTI to be displayed on V 
the surface of M13, we vill vary part of the BPTI ^ 
sequence and/or insert shore randon DNA seq^jences ^' 
between BFTI and M13 C? (Soc. I. 3.^ J. 

Smith (SMITSS) and de la Cruz ot aK_ (CRU223) have ^ 
shown that insertions inro ge.ne III cause novel protein 
dorcains to appear or the virion- outer surface. If BPTI , ^' 

can not be made to' appear c.n che virion oucer surface 
by fusing the opt I gene to the r.^ \ 1 c :.• gene, ve will fuse 
hot i to * gene IXX. either at the si^e used by Sni-h and 
by de la Cruz et a 1 . or to one of the terrTiini. VJe will 
use a' second, synthetic cTpy of gene III so that sor.e 
unaltered gene III protei.n will be present. 
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The gene VIII proccin is chosen as OS? because it t 
is present in nany copies and because its location and 
orientation in the virion are Known. Mote thac any 
uncertainty about the azimuth of the coat protein about 
30 its own alpha helical axis is unimportanc; the amino 
terminus ic- exoosed for all azimuths. 

The 3D model of fl indicates strongly that fusing |-; 
BPTI to the a.-ninc terminus of H13 CF is nore likely to 

t 

35 yield a functional protein than any other fusion site. l 



g (See Sec. 1.3.3) . 

The anino-acid sequence of H13 pre-coat (SCHA78), 
called AA_seql, is 

3 ^ " ■ 

AA_seql 

1 3 2|:2 3 3 44 5 

5 0 S 0 {/5 0 5 0 5 0 
r| 10 MKXSLVLKASVAVATLVF:<LSr.V.ECDDPAKAArNSLQASAT£YIGYAWA 

1 
! 



5 6 6 7 7 

5 0 5 0 3 

MWVIVCATIGIKLFKKFT3KAS 
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^ The single-letter codes for anino acids and the codes 

for ambiguous o:/A are given in Table 1. The best site 
tor . .inserting a novel protein domain into M13 CP is 
20 after A23 because SP-I cleaves the precoat protein 
after A23, as indicated by the arrow. Proreins that 
can be secreted will appear connected to r;iaturo K13 CP 
at its anino tercinus. Because the amino terrinus of 
mature M13 CP is located on the outer surface of the 
25 virion, the introduced dor.ain will be displayed cn the 
outside of the virion. 

SPTI is chosen as IPBD of this exanple {Zee Sec. 
2.1) because it neets or exceeds all the criteria: it 
30 is a Suiali, very stable protein with a well k.nown 30 
structure. r<arks et -iX . (r-lARKSfi) have shown that a 
fusion of the nhoA signal peptide gene frag.T.cnt and ONA 
coding for the nature fom of RPTI caused native 3P7r 
to appear in the periplas.-n of E^. col i . donjons t rat ing 
35 that there is nothing in the structure of BPTI to 
prevent its being secreted. 

Marks et aJU (I'./\RKo7) also showed that the 
structure of BPTI is stable even to the rcnoval of one 
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of the cystine bridges. They did this by rcpldcing 
both C14 and C33 with cither two alanines or two 
threonines. The Cl^./C2S cystine bridge that Mar>:s *;t 
aj_^ removed is the one very close to the scissile bond 
in BPTI; surprisingly/ both mutant molecules 

functioned as trypsin inhibitors. This indicates that 
BPTI is redundantly stable and so is likely to fold 
into approximately the same s'^iructure despite numerous 
surface mutations. 'Jsing the knowledge of homologues, 
vide Infra, we can infer which residues must not be 
varied if the basic BPTI structure is to be maintained. 



The 3D structure of BPTI has been determined at 
high resolution by X-ray diffraction (HUBE77, 1{A?.Q^2 , 
15 WLOD34, WLO037a, WLOD37b), neutron diffraction 
(WLOD84), and by imR (WACN'S?). In one of the X-ray 
structures deposited in the Brookhavcn Protein Data 
Bank, "GFTI". there was no electron ■ dens ity for A53, 
indicating that A53 has no uniquely defined 
conformation. Thus we know that the carboxy group does 
not make .^ny essential interaction in the folded 
structure. The ar.ino teminus of SPTI is very near to 
the carboxy teminus. Goldenberg and Creighton 
reported on circularised BPTI a.-.d circular; y pernuccd 
25 'BPTI (GOLD83) . Scnc proteins homolcgous to BPTI have 
more or fewer residues at either terminus. 

BPTI has been called "the hydrogen atom of protein 
folding" and has been the subject of numrrous 
30 experimental and theoretical studies (STAT37, SCHyS7, 
COLP33, CHAZ83). 



35 



BPTI has the added advantage that at Ic^st 32 
homologous proteins are known, as shown in Table 13. A 
tally of ionizable grcups is shown in Table U and the 
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conposicc of amino ac;,a types occurring at each resid«iO 
is shovn in Tabic 15. 

BPTI is freely soluble and is not known to bind 
metal ions. DPTI has no known enzynacic activity. 
BPTI binds to trypsin, 6.0 x 10"^^ M (TSCH87). 

BPTI is not to:<ic. If K15 of BPTI is changed to L, 
there is no rieasurable binding between the r.utant BPTI 
and trypsin {TSCH37) . 

Stereo Figure 7 shows the alpha carbons of BPTI 
plus the side groups of coaser'^ed residues; all four 
atoms of conserved glycines are shown. All of the 
conserved residues are buried; of the seven fully 
conserved residues only C37 has noticeable exposure. 
The solvent accessibility of each residue in BPTI is 
given in Table 16 which was calculated from the entry 
"6PTI" in the Brookhaven Protein Data Bank with a 
solvent radius of 1.4 A, the atonic radii qiven in 
Table 7, and the r.echcd of r.Ge and Richards (L;:eD71). 
Each of the 51 ncn-ccnservcd residues can acconrxdate 
two or more ki.".ds of ar.ino acids. By independently 
substituting at each residue only those arr.ino acids 
already observed at thJt residue, wo could obtain 
approx ir.Atcly 7 x lO"'^ different nmino acid sequences, 
r.ost of which wiM fold into structures very similar to 
.BPTI. 



DPTI will be useful ac a IP130 for nacrooo Iccu Lcs . 
30 (See Sec. 2.1.1) UFfl and BPTI homolcques bind tightly 
and with hit;;h specificity to a number of enzymes. 

BPTI is strongly positively charged except at vor,* 
hicjh ci(, thus DT'T: is useful as IPDO for targets that 
35 are not" also strongly positive under the conditions of 
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intended use (see Sec, 2.1.2). There exict homologues 
of BFTI, however, having quite different charges f viz . 
SCI-rri from Bcr.bvx norl at -7 and the trypsin 
inhibitor from bovine colostrun at -1). Once a 
S derivative of M13 is found that , d isplays BPTI on its 
surface, the sequence of the BPTI dor.ain can be 
replaced by one of the homologous sequences to produce 
acidic or neutral IPDDs. 

10 BPTI is not an en^yne (See Sec. 2.1.3). BPTI is 

quite sr.ail; if this should cause a pharnacologica 1 
problea, tvo or more BPTI-derived domains riay be joined 
as in the human DPTI homologue that has tvo domains. 

15 A derivative of M13 is the preferred OCV. (See 

Sec. 3). Wild-type M13 does net confer any resistances 
on infected cells; M13 is a pure parasite. A 
"phagenid" is a hybrid bciveon a phage and a plasnid, 
and is used in this invention. Double-stranded plasmid 

20 DUk isolated from phrtge.r.id-bea r l ng cells is denoted by 
the standard convontio:;, e.c pXY2s. Phage prepared 
from these cells would be d::siqnated VY2-i, Phagemidi; 
such as aluoscript K/S (sold by Stratagene) are not 
suitable :or our purposes because Bluescript does not 

25 contain the full genome of H13 and must be rescued by 
coinfeotion with competent wild-type M13. Such 
coinfcctions will likely lead to genetic recombination 
yielding heterogeneous phage unsuitable for the 
purposes of the present invention. 

30 

It is also well V;nown that plasmids containing the 
ColEl origin of replication can bo greatly amplified if 
protein synthesis is halted in iog-phase culture. 
Protein synthesis can bo halted by addition of 
33 chlorarphcnicol or other agents {MAr;ra2) . 
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The bacteriophage M13 bla 6L (ATCC 37039) is 
derived from wild-type M13 through the insertion of the 
beta lact.^nase gene' (MINE80) . This phage contains 3.11 
r, kb of DNA. M13 bla cat 1 (ATCC 3ro-;0) is derived frors 
M13 bla Gl through the additional insertion of the 
chloratr.phenicbl resistance gene (HINEaO) ; M13 ^la cat 1 
contains 0,S8 kb of DNA. Al though . ne ither of these 
variants of M13 contains the ColEl origin of 
10 replication, either could be used js a starting point 
to constr-jct a usable cloning vector for the present 
example. 

The OCV for the current exa-ple is constructed b/ 
15 a process illustrated in Figure 8. A brief description 
of all the plasmids and phagenids constructed for this 
Example is found in Table 17. 

For ss oligo-nt s i te -d i r -ec ted mutagenesis, 
2 0 rr.ultiplo crir^.ers lead to higher efficiency. Three non- 
rr.utagcnic primers are used : 



5'{ 2 3 2ol GGC CGC TCT GAG CCT CCC CCT ( 23 52) 'j[ wt:<13 
3' ccg ccg aga etc cca cog czz 5' olLq-2A , 



*5'(485-;) CCT GCT GGC TCT CAC CCC CCC (48751 2' --ft.MU 
■ 3' eg.-, cga ccg aga gtc gc^ -.'c^ 5' olig = 25 , 



and 



5'(3';5l) CCG GTC AGC GTG CCT C7C CCC (3 4 3 1) 

3' ggc cac teg oac cca gag cgc 5' olig526 . 

01ig?2< is complementary to a segr.ent near the end of 
M13 gene I_Li '"in^l olig = 25 is corp I er.enta ry to part of 
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y.l3 gens IV (SCKA73). OliqifZfi is p^rt of tJie ano^^ gene 
from pBRj22 (MANI82, Appendix D) ; the nunbers shovn 
refer to pBR.122 base pair numbers. Note that pu;2 and 
its derivatives carry the anti-sense strand of the ar.^^ 
5 gene in the + DNA strand. The segments are picked to 
be high in CC content and to divide the pLG7 geno::e 
in^o several segments of approximately equal length. 

The genetic engineering procedures needed to 
10 construct the OCV are stanaard. All restriction 
digests use comnercially available enz/nes and are 
carried out under conditions recommended by the 
supplier. Ail restriction fragments *f DNA are 

purified by HPLC or electrophoresis frora agarose gels 

15 as described elsewhere in the present invention. 
Conpetenc ^ col i are preferably prepared by a modified 
version of t.Te procedure of Maniatis iyj<tilS2) given in 
the generic detail section. H12 and its engineered 
derivatives are infected into ^ col i strain PE3c-i 

2C (F* , Rec",$up^, Anp^) . Plasaid DNA of M13 derivatives is 
transferred into col i strain Pz! 3 3 3 ( r" , Rec~ 

. Sup^ , A.T^p^) so that wc avoid xultiple infections that 
night arise once phage are produced. Isolation of MI3 
phage is by the procedure of Salivar et a I . (SALI54) ; 

25 isolation of riplicative forn (RF) M13 is by the 
- procedure of JazwinsJci et .il . (JAZWTja and JAZ'rt7jb) . 
Isolation of plasmids containing the CclEl origin of 
replication is by the nethod of Maniatis (MA::ra2; . 

30 D:ia sequencing is by the method of Sanger 

(AUSU37) . Virions of H13 derivatives contain circular 
ss DNA that is called the viral + strand. Dase nur.cers 
are assigned from an agreed origin and in ascending 
order in the 5'-to-3' direction of the viral + strand. 

35 Conventionally, this DNA is drawn with the 5'-to-3' 
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direction clockwise and corresponding to increasing 
base nunber. In relation to the gcno-es of 
derivatives, wc will use "up" or "above" to nean higher 
r..j:v base nurrber or further along in clockwise direction. 

5 Si.-nilarly "down" and "below" will mean lower base 
number or further along in the counterclockwise 
^V-i^ direction. To deterrr.ine the base sequence of part of 

'•''-^ an .H13 derivative, one needs a sequencing prir.cr Cha 

is conplenentary to a region above and within about 100 
10 bases of the region to be sequenced. Because the OCV 
is constructed fron parts of M13rr.pl8, parts of pBR322 , 
and synthetic DNA, the sequence of flanking regions is 
i.-^^ always known, 

15 We pick the arn *^ gene fron pBRj22 as a ccnvenienc 

antibiotic resistance gene. Another resistance gene 
^r'f^ such as kananycin, could be used. (The New England 

BioLobs 1963/39 catalogue contains a genetic nap o 
pBR322 on page 106.) The plasnid pBP.322 also contains 
20 the ColEl origin cf replication. The restriction sices 
Acc I at 2 24 6 and Aat ir at 4286 are the nest 
convenient places to cut pBP.3 22 to obtain bcch an 
intact a-Q ^ gone and the CcIEl origin of replication 
r.;-}^ with ends suitable for ligation to other DN'A. 

25 



The plas.T.id pQP.322 contains a unique Alv:J I sice 
at base 2386 that is between the ano ^ gene and cri . 
There is a unique A 1 \'U L sit"*? in M13nipl3 at base 2137. 
. ^ '-^'hen the Acc I-to-A^ ir fragir.ent of pDR3 22 is liyatcd 

'^J ■ 30 into M13r.pl3, there will be two Alwff I sites and no 

easy way to excise the ano^ gene. Thus we convert the 
AlvK I site of pBH322 into an Xba I site that will be 
unique in all the D::a constructs of the present 
ile. The two oligo-nts: 
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5' ccqiiTCTACActiagCcqCCA 3' ol 19 = 60 

3' CGTagccACATCTaaccagc 5' olig:61 



are synthGS i zed , by srandard .r.ethods and anneiied. T^e 
AU-N I site at base 2336 in p2.^322 has the sequence 5'- 
CAGCCACTG - 3'. Plasnid pBR322 is cut with AlvfJ r ar.d 
nixed with the synthetic ds OriA and liq.ited. Cells irs 

LC transCorr.ed ar.d scleocod tor tetracycline resisc^r.ce. 
Tetracycline resistant colonies are screened for the 
correct insert by restriction dicestion with ^ba I tr.at 
cuts the correct construction but net pBR322. The 
correct construction is called pLC322. Plaszid pLC:22 

15 differs froo pDR322 cnly by the replacenent o: the 
A luff I sine with an : site. 



The plasTnid pLC322 contains a second A-iv I 
restriction site at base 651 so that digestion, of 
20 pLG322 with Aat, 11 ar.d Acc I yields three fraqmcn-s, 
or.e of about 20-; 1 bases (that ve want), cho cf ■''.bojt 
723 bases, and one c: about IcOO bases. To facilit-ite 
^v'.^ isolation of the 20^ l-base Eracrt:ent, we al-jo diqcst 

t^: pLG322 with -Itv : th.=it cuts at base 13'69. The 5V/ I 

'"'"^ 2 5 cut redOces the IGOO-base franrx-nt to two fracrierius cf 

about 700 and about 9'JO bases each. Ws purify the 
2041-nt fragment by H?t-Z or agrircse gel 
-electrophoresis . 



30 M13cpl8, sold by f.'cv/ England BioLabs. contains 

neither Aat 11 nor Acc I "sites. Therefore we insert an 
adaptor that aUov/s us to insert the Aat ir-tc-Ac^ I 
fragr;cnt of pLG3 2 2 that carries the a^-£^ gene and the 
CclEl origin of replication into a desirable place in 
35 M13nipl3. MIj-pl3 contains a lacL-/5 proncter and a l^cZ 
gene that are not ur.eful to the purposes of :he present 
invention. By cutting Ml 3r.pl 3 with Ava :: at the 
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unique site at 5914 and with Dsu3 6 I at the unique site 
at 6503 and discardin-^ tlie approjc icat ely 600 
intervening base pairs, ue elininate all recognition 
sites of the enzymes shown in Table 18 from H13npl8. 

M13r.pl8 itself is not cut by the enzynes listed in 
Table 19. Among the enzymes in Tables 18 and 19, those 
listed in Table 20 have recognition sites within che 
Acc I'to- Ati t II fragnent of ptjG322 that contains the 
app ^ gene and the ColEl origin of replication. 

Therefore the following adaptor is synthesized, 



5' GACCCACCTCtgcctcGTATACCCCACCCcatagctCC 3' oliqsi 
15 3' CCTCCAGacggagCATATCCCCTCCCqtatcgaCGACT 5' oligS2 
AvalllAatlli lAccr[Psrri I I 8su}6T 
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Where the Ava II and Aa t II sites share one GC base 
20 pair, and the Acc I and Rsr II sites share a different 
CG pair. The two.33-base ol iqc-nts are synthesized by 
a standard procedure described elsewhere in the present 
invention; the oligo-nts are anncaied to oach other. 
The b?ces shown in lower case are spacers. In a later 
25 step, we will cut this adaptor with both AiH II and 
Acc I; for both enzynes to cut efficiently, there nust 
^ be at least five bas between the sites. Sinilarly, 
we will begin the construction of the obd gene by 
inserting 0»A at the P:;r 11 and Esu36 I sites; thus 
30 these sites are separated by seven bases to allow 
simultaneous cuts. 

The annealed nd.intor is I ion ted with RF M13rr.pl2 
that has been cut with both Ava II and 9 5u36 1 and 
35 purified by UPLC or polymery lac iie gel electrophoresis 
(PAGE). Cells are transfornod with the ligated mx. 
DNA from colonies selected on. L3 agar with anp ic i 1 1 i n 
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is screened by restriction digestion. The desired 
construction can be cut with Rsr II or Acc I, but not 
by any of the enzymes listed in Table 18. Plasraid DNA 
from colonies that have the predicted restriction 
digestion is sequenced in the region of the insert to 
verify the construction. This construction retains 
boch the Ava II and the Rsu36 I sites. The resulting 
construct is called pLGl. 

The plasnid pLGl is grown by standard techniques 
and DNA isolated and cut with both Aa t ri and Acc I. 
After ligation, there will still be Aat ri and Acc I 
restriction sites at the ends of the inserted DN"A. The 
Aat II-to-Acc I fragment of pBR322 is ligaied to the 
backbone of LCI. The ligated d:^A is uied to transform 
competent col i that are plated on a.-picillin- 

containing plates after a short grow-out. 
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Ampic i 1 1 in-resistant colonies are picked. Plasmid 
20 D.MA of the phagemid fron the resistant colonies are 
digested wich Bnu3 6 I and Psr I. To verify the 
construction, DMA from phagenids with the correct 
restriction digestion pattern is sequenced: a) from 
about 20 bases above the Bsu 3 ^ I site to about 20 bases 
25 below the Rsr I site, and b) for about 30 bases either 
'side of the unique Ava II site. The correct construct 
is named pLG2. 

The Acc I restriction site is no longer needed for 
30 vector construction. To eliminate this site, RF pLC2 
dsDN'A is cut with Acc I, treated with Klcrov fragment 
and dATP and dTTP to make it blunt and then religatcd. 
The ligated DNA is used to transform competent cells; 
after a short grow-out, arpicillin-rosictant colonies 
are selected. Restriction digestion is used to screen 
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phager.id D.'iA from these colonies; the desired product 
c^nnot be cut with Acc I. To verify the construction, 
DNA fron ccionies iacking an Acc I restriction cite is 
sequenced froa about 20 bases above . the fomer Acc I 
site to about 20 bases below it. The cloning vector, 
named pLC:, is now ready for stepwise insertion of the 
osp-jpbd gene. 

We are now ready to design a gene (See Sec. 4) 
that will cause QPTI-domains to appear on the outer 
surface of an M13 derivative: LG7 . 

To obtain a novel protein doiaain attached to the 
outside of we insert DNA that codes for nature 

DPTI after A23 of the precoat protein of M13. Mature 
BPTI begins wirh an arginine residue, which i*^ charged; 
cleavagc by signal peptidase I is normal in such cases. 
Signal peptidase I (SP-I) cuts a chimera of M13 coat 
protein ant EPTI after A23 leaving nature BPTI attached 
at its carbcxy end to the amino teminus of H13 CP. 

The foLlcwing anino-acid- sequence , c-illsd AA_3eq2, 
is constructed, by inserting the sequc^nce for mature 
BPTI (shown underscored) immediately after the signal 
sequence of M13 preccat protein (indicated by t.'^e 
arrow) and before the sequence for the M13 CP. 



100 ^; 

/\A_seq2 

112)1233445 
5O50V505O50 



t. 



5 MKK5LVLKASVAVATLV PMLSFA ?PDrCLE:PPYTCFCKv\RI irt/FYNAKA 

56677 8399 10 

5050505050 
10 CLCOTFVYCGCRAKR?:MFKSAf:pC"PTCCCA A£C0CPAKAArNSLOASAT 

' ■'' 

V. 

10 11 11 12 12 13 pi 
5 0 5 0 5 0 h' 

15 EYIGYAWAMW^^IVCATICIKLFKKrrSKAS ^fc 



We adopt the convonti-jn that sequence nur.bers of 
20 fusion proteins refer to che fusion, as coded, unless 
ctherv/isc noted. Thus the alanine that begins .".13 CP 
is referred to as "nur'.bor 32'*, "nur.ter 1 of MI2 CP", or 
"number 59 of the nature 3PTI-M13 CP fusion". 



2 5 The osD- i nod " gene is recjulatcd by the lacUVC ^ 

promoter, so that the level of expression can be i. 
regulated, by the concent ra c ion of I PTC supplied in the £• 
growth medium. (See Sec. ^.1). The host stroin of 
col i should harbor tha ! c I gene that represses the 

30 1 acUV5 promoter to a greater ox'ont than 1 n " I ^ . The- ? 
^ csp- iobri gene is ended by the t tp attenuator so that 
RKA polymerase will not read through into subsequent 

i. 

genes. The osn- i obd gene is expressed and processed in 

parallel with the wild-type gene v I I I . The novel ^ 
35 procoin, that consists of B?TI tethered to a MU CP 

domain, constitutes only a traction of the coat. 

Affinity separation is able to separate phage carrying ^; 

only five or six copies of a r;Oiecule that has high t 

affinity for an affinity matrix (SMITBS) ; \X 

AO incorporation of the chimeric protein results in about ^^ 

30 copies of the protein e:^posed on the surface. If 
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this is insufficient, additional copies may be 
provided . 



tf ■ 

Figure 9 shows, in stereo, a hypothetical model of 
a short segment of the coat of a derivative of M13 in - 
which some coat protein ncnor.ers are fusions of nature 
BPTI to the anino terr,inu£ of the ncrr.al Ml 3 CP. The 
figure shows only protein C^^pij^i;; the DNA, rot shown, 
lies inside the cylinder. The r.cdel of M13 coat is 



the DNA is protected fron solvent. The figijre is not 
meant to suggest that BPTI fused to yi3 CP will adopt 



m 



10 after the nodel for fl of Mar'-^in and colleagues 
(BAI/NBl). The BPTI domain is taken from the Brookhaven 

Protein Data Bank entry "GPTI" and was attached by 

t •■ 

standard model building ncthcds tnat insure that 1^:. ^ 

*v - 

covaient bond lengths and angles are close to ) - 

15 acceptable values. The space bef-cen the alpha helical 

main chains is filled by protein side groups so that ^, 



the conformation shown, which is arbitrary. Rather the 
20 model shews that the fusion protein could fit :,nto the 
s uprarr.o lecu la r structure in a s te .-ccchemica I ly 
acceptable fashion without disturbing the internal 
structure of either the M13 CP or BPTI domain. 

25 ^ The osn - i pbd gene will use; a) the lacUV5 

promoter, b) a Shine-Da 1 ga rno sequence having high 
honiology to - natural Shine-I^a Iga rno sequences, c) a 
completely synthetic coding region having codcns 
assigned to optimize placer.ent of restriction sites, 

30 and d )■ the trp attenuator as transcriptional 
terminator. (See Sees. . 1 and 4.2). 



The ambiguous DN'A sequence coding for AA_seq2, f- 
shown in Table 3, is examined by PKOSPECT for places f'- 
35 where recognition sites for any of the cnzyr.es listed r: 



in Tabic 21 could be created uichout filtering th« fcr. *'* 
amino-dcid sequence. (See Sec. 4,3). a master table 

of enzynos is* conpilcd from the catalogues of er.zyne — i 
suppliers listed in Table 4. The enzymes listed in 

5 Table 21 are those that do not cut ' the OCV, the f'-r f' 

construction of which is described above. The codes t-?/'*'^- 
used in the ar.biguous DNA are shewn in Table I. 

Using the procedure given in Sec. 4.3, we design a ^5^"' 

10 ipbd gone, such as that shown in Table 2T. and in Table ^k-iVi 
2 'J . The recognition sequences of commercially 

available enzyraes that recognize five or more bases are K'i-.'^^ 

shown in Table 4. Some of these enzynos f c . g . Ban I or f-j- -'^ 

Hph I) cut the OCV too often to be of value. A sur.r.ary t 
15 of restriction sites in rhe designed pbd gone are given 

in Table 24. fr.-V-^- 



The entire DNA sequence of th2 r. 1 2c:>- ' b:)': i fusion ^v'^T^l^ 
vith annotation appears in Tc»blc 25 shoving che useful 
20 restriction sites and biologically i-portanc features, 
2lZj^ the Irt-y-r/Z promoter, the lacO opera::or, the Shine- 
Dalgarno sequonce, the anino acid sequence, Lhe stop 
codons, and the .transcriptional terrriina tor . ■ ^.""r'-T 



25 - _ The i gbd gene is synthesized in several steps |; 
using the method described in Sec. 5.1, generating ds 
CNA fragments of 150 to 190 base pairs. In this 
example, the 3' overlap window C^Jy; is set to run from 
23 to 27 which is generous. The end spacers (tl^) that 

30 are added to insure efficient digestion are set to 8, 
which is also generous. Syntheses designed with 
ST.aller overlaps and shorter spacers would allow longer 
fr^igments of dsDNA to be synthesized and consume less 
of the reagents. Mote, however, that Oliphant and 

35 Struhl (OLIPa?) required large excesses of restriction 
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enzyr.cG ,-eant Co cue near Z'e ends of their dsHNA; cMs 
could have been because they had set Ks=2. 

All DNA synthesis and puri f ice r.ion is done by 
5 standard methods as doscriiied in Sec. 5.2. 

The four steps (See Sec. 6.1) by which we clone 
synthetic fragments of the 3cp-bc-t i gene fthe oso- 
jpbd qenc of the prcscnc exanple) into pLG3 and its 
10 derivatives are illustrated in Figure 20. , 



The sequence to be introduced into pLG3 is shown 
in Table 26 and in Table 2". The segiaent is 158 bases 
long .and is synthesized fron two shorter synthetic 
oligo-nts as described ir. Sec. 5.1 of the generic 
specification. The inport;:r.t features oC this segment 
are five restriction sites, the K^cCA'S promoter, a 
Shine-Dalgarno site, and whe TrpA attenuator as shown 
in Table 26. 



Table 27 repeats the ar.ti-sense strand shown in 
Table 26. The 99 brfse fracr.ent shown in upper case 
letters and underscored ( 5 ' -CCGTCC. . .. . CCITCC-3 ' 
olig»j) . is synthesized in the standard manner. 
Similarly, the 100 base Icng fragment of the sense 
strand shown in lover c.-i:;e ( 5 ' -cgctca . . . . aa ttg-3 ' = 
^oligi-lj is synthesized. Artcr annealing, the double- 
stranded region is extended with Klenow fragr.ent by the 
procedure given above to -.'\>;e the entire 17S bases 
double stranded. The overlap region is 23 base pairs 
long and contains. 14 CC p.-». irs and 9 AT pairs. The DKA 
between Avr II and .V-.u II d.-.os not code for anything in 
the final £bd gene; it is there so that the DMA can be 
cut by both Avr II and Ai--.: 11 at the same tine in the 
next step. This spacer v.is made rich in C and C so 
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that annealing -of the two s in^; 1 c-s tranded DNA fraqr.onts 
will be efficient. Eight bases have been added to the 
left of Rsr II and nine bases have been added to the. 
left of Sau "l (same- specii icity and cutting pattern as 
Bsu36 I) , These bases at the ends are not part of the 
final product; they must be present so. that the 
restriction enzymes can bind and cut the synthetic DNA 
to produce specific sticky ends. 

The synthetic D.'IA is cut with both r^au I and 
RsX II and purified by HPLC or PAGE. RF pLG3 is cut 
with Sau I and Aya II and purified by HPLC or agarose 
gel electrophoresis and . electroe lut ion . The large 
piece fron the phagernid and the synthetic DfU are 
ligated and used to transform E^ col i . Ampicillin- 
resistant colonics are obtained and pUsnids are 
screened by endonuclease digestion of RF phagernid DNA. 
The desired product can be cut by Avr II, Asu II, or 
BstE II, but the original phagernid car. not be cut by 
any of '"hese enz^^xs. To verify the. insert; DNA fron 
isolates that have the correct restriction sites is 
sequenced from about 10 bases above the Sau I site to 
about 10 bases bclcv the RsJE II site. -The construct 
with the correct insert is called pLG*; . 

The second step of the construction of the OCV is 
Illustrated in Tables 28 and 20. This second segrsent 
of DMA is 155 bases long. As in the construction of 
pLG4, two pieces of s i ng 1 e -s tra nded DNA are 
synthesized. A 99 base long fragment of the anti-sense 

sr.rand (5'-CCACCA CGTCCC-3' = olig?5) is shown in 

upper case letters and underscored; the other piece of 

99 bases {5'-gatcta atcacct-3' = olig = 6) is shown 

in lower case and is a frag-cnt of the sense strand. 
These strands arc complc-nen tary over 24 bases. 
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I, astx I, or 
these en;v~es. 



The construct with the correct third insert is called 
pLC6. 

The construction ot* pLCl is illustrated in Tables 
5 32 and 33 and proceeds sir-ilariy to the constructions 
of pLG4, pLG5, and pLCS. The two single stranded 
segments (olig:9 and oiigno} are synthesized, 
annealed. And extended with Klenow fragr.ent. 'ooth the 
synthetic DNA and ?S pLG6 are cut with both .Dbe I and 

10 Asu II. purified, and the appropriate pieces are 
ligated and used to transform col ; . Ar.picillin- 

resistant colonies are screened by restriction 
digestion of phagenid Rr DNA. The desired phage.nid can 
be cut with any c: Sf± I, !Und III, tUu I, BstX 

15 fico I, while pLC6 can be cut by none 

To verify the fourth insert, DNA fron p^^gemids with 
the correct restriction sites is seq^jenced fron about 
10 bases above the A^ II site to about 10 bases below 
the She X site. The construct with the correct fourth 

20 insert is called pLG7; the ,iis?lay of BPTI on the cuter 
surface of LCI is verified by the methods ct Sec. 3. 

M13ar;-129 is an ar.ber -utation of M13 used to 
reduce rion-specif ic binding by the affinity n'.':atri>: for 
25 phages derived fron ::13. t>;i3.v:A29 is derived by 
standard genetic r.cthcds (:::lL72) fron vtni:. Mi3,i.-429 
-is grown on col: strain rE3Sa(F*, SupH:, Rec". K-r^) 
and harvested by the standard -ethod. 

30 Phage LG7 is grown on col i strain PE36-; in LB 

broth with various concentrations of IFTC added to the 
mediun to induce the oso- i:;bd gene. Phage IC! is 
obtained froin cells grown with 0.0, 0,1, l-O, 10.0 or 
100.0 u.-^:, or l.G r.:: IPTC, ha revested (.See Sec. 7) by the 

35 method of Saiivar (::aLIG4) , and concentrated to obtain 
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a titre o: lO^^ piu/nl by the 
(MESS83) . 



uetl-.od of Kcssir.q 



The preferred' =ethod o£ deCermininq --hocher LCI 
displays 3i-ri or, i.s surface (See Sec. S) is to 
determine vhocher chcse phage can recain a labeled 
derivative of trypsin (trp) or anhydrotrypsin (AHTrp) 
on a filter that allows passage of unbound trp or 
AHTrp. Trypsin contains 10 tyrosine residues and can 
be iodinated with l-'^I by standard methods; we denote 
the labeled trypsin as "trp-. Labeled anhydrotrypsin 
is denoted as "AHTro-. Other types of labels can be 
used on trp or AHTr?. e^ biotin or a fluorescent 
label. AHTrp. or trp- is labeled to an activity of O.J 
uCi/ug. A samole of 10^2 ^,(10 nu'l IPTC) is ni.ed with 
1 0 uq of trp. or AHTrp- in 1.0 ml of a buffer of 10 r.-. 
KCl, adjusted to pH S.O with 1 KjHPO, / KlzPO,. The 
mixture is. passed through an Anicon H3?l svsten fitted 
with a r_e=brane filti^r that allows passage =: proteins 
sr,aller that = -O.OOO. Filters ar. soaJ-.ed in 

buffer containing trp or AHTrp prior to the analysis. 
The filter is washed twice with 0.5 p-1 o.: buffer 
containing trp or AMTrp. The radioactiv i cy retained on 
the fUt..r is quantitated with a scintillation counter 
or other suitable device. If each virion displays c.-.e 
dopy of BPTI, then .i5 ug of protein can be bound t^at 
would give rise to 3 x lo" disintegrations / r.i.-.ute on 
the filter. 

An alternative vay to quantitate display of Q?"! 
on the surface of LG7 is to use the stoichiometric 
binding between trypsin and BPTI . to titrate the BrTI. 
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dctcrrained spccc rophoComo tr ical Ly usinc the noUr 
extinction coefficients at 260 nin and 280 nni corrected 
for the increased length of LG7 as compared to utM13. 
For example, if, a 1.0 ml solution that contains 10^^ 
pfu of LC7 phage grown with 1.0 inM I?TG inhibits 
trypsin solutions up to 4.3 x 10""^ M, we calculate that 
there are approximately 30 DPTIs/CP (i.e. ( ; . 8 x lO"' 
molecules of 0PTI/1)/(1.6 x IC'^ phaqe/1)). ■ Inhibition 
of a specified concentration of trypsin is nost edsily 
measured spectrophotonetricaUy using i popt ide- 1 i nked 
dye, such as U^i^^^-benzoyi-Arq-Han (TSCH*?). 

Alternatively < binding to an affinity column nay 
be used to demonstrate the presence of BPTI on the 
surface of phage LG7 . An affinity colucn o: 2.0 d1 
total volume having BioRod Affi-Cel lO*"^) matrix and 
30 ng of AHTrp as affinity material is prepared by the 
method of DioRad. The void volume (V^) of this cclu-n 
is, by hypothesis,. 1.0 ml. This affinity column is 
denoted (AHTrp). 

A sar.ple of 10^- M13£iin429 is applied to (AHTrpt in 
1.0 ril of 10 mti i'Xl buffered to pH S.O with KH^PO; / 
K2HPO4. The column is then washed with the same buffer 
until the optical density at 2S0 nm of the effluent 
'returns to base line or 4 x Vy have been passed through 
the column, wliichever comes first. Samples of LG? or 
LCIO are then applied to the blocked {AKTrpl column at 
10^2 pfu/ml in 1.0 ml of the same buffer. The colur^n 
is then washed again with the same buffer until the 
optical density at 250 nm of the effluent returns to 
base line or 4 x Vy have been passed through, whichever 
comes first. Following this wash, a gradient of KCl 
from 10 mM to 2 H in 3 x Vy, buffered to pH 8.0 with 
phosphate is passed over the column. The first KCl 
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gradienC is foilb'-ed by a KCl gradient running iron 2 M 
to 5 M in 3 X Vy. The ccccr.d KCL gradient is fol loved 
by a gradient of guanidiniua CI from 0.0 K to 2.0 M in 
2 X Vw in 5 M KCl and buffered to pH 8.0 with 
5 phosphate. Fractions of 50 ul are collected and 
assayed for phage by plating A ul of each fraction at 
suit::blc dilutions on scr.citive cells. Retention of 
phage on the colur.n is indicated by appearance of LG7 
phage in fractions that clute significantly later froa 

10 the colunn than control phage LCIO or ut.Ml3. A 
successful isolate of LG7 that displays BPTI is 
identified, the bp t i insert and junctions are 
sequenced, and this isolate is used for further vork 
described below. It is likely that a significant 

15 fraction of clonal isoU'-.es fron the sane ligation that 
are characterized an identical by restriction digestion 
will similarly display BPTI. 

If vgONA is used to obtain a functional fusion 

20 between a EPTI rutant and Ml 3 C? f v ide i nfra ) , then DfM 
from a clonal isolate i:> sequenced in the regions that 
were variegated. Then gratuitous restriction sites for 
useful restriction cnzyr.es are renoved if possible by 
silent ccdon changes as follows. A de novo piece of 

25 synthetic DriA is synthesized such that the r.elccted 
amino acid sequence is preserved and cloned into pLC7, 
The sequence nur.bcrs of residues in OSP-IPBD will be 
changed by any insert icns; hereinafter. we will, 
however, denote residues inserted after residue 23 as 

30 23a» 23b, etc . Insertions after residue 81 will be 
denoted as 81a, 31b, etc. This preserves the nu.-nbering 
of residues between C5 in BPTI and CSS in BPTI. 
Residue CE of BPTI is always denoted as 28 in tne 
fusion; residue C55 of BPTI is always denoted as 78 in 

35 the fusion, and the intervening residues have constant 
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nunbers . 



10 



15 



Should' LG? phaqe fro.*n cells grown with 10 mH IPTG 
fail to display BPTI on its surface, we have several 
options. We night try to dcc.ernine vhy the 
construction failed to vorJc as expected. there are 
various possible nodes of failure, including : a) BPTI 
is not cleaved fron the Ml 3 r. iqnal sequence, b) BPTI is 
cleaved fron the M13. CP, and c) the chimeric protein is 
nade and cleaved . after the signal sequence, but the 
processed protein is not incorporated into the M13 
coac. BPTI has been secreted from col i (HARK86) ; 
however the M13 coa t-protei i\ signal sequence was not 
used. Therefore problcns stcnning fron the signal 
sequence are unlikely, but possible. We could 

detenninc whether BPTI was present in the periplasm or 
bound to the inner ner.brane of LG7-infected cells by 
assays usir.g labeled -trypsin or anhydrotryps in . 
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Proteins in the periplasm can be freed through 
spheroplast fomaticn using lyso7.yne and EDTA in a 
concentrated sucrose solution iBlRDSl , ■ yJKlu\C^) . It 
3PTI were free in the poripias.-. it would be found in 
the supernatant Trypsin labeled with ^^^I would be 
nixed with supernatant and passed over a non- 
denaturing nolecular sizing colunn and the radioactive 
'fractions collected. The radioactive fractions would 
then be analyzed by SD3-PACE and cjcanincd for BPTI- 
sized bands by silver staining. 

Spheroplast fomation exposes proteins anchored in 
the inner nonbrane. Spheroplasts would be nixed with 
AHTrp* and then either filtered or centrifuqed to. 
separ.ite then from unbound AiiTrp*. Aftor washing with 
hypertonic buffer, the spheroplasts would be analysed 
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Cor extenc of AHTrp* binding. 

If BPTI were found free in the ' per iplasn, then we 
would expect that the chimeric protein was being 
cleaved both between BPTI and the M13 nature coat 
sequence and between BPTI and the signal sequence. In 
that case, we should alter the BPTI/H13 CP junction by 
inserting vgDNA at codons for residues 73-82 of 
AA_seq2 . 

If BPTI were found attached to the inner nenibrane, 
then two hypotheses can be formed. The first is that 
the chir.eric protein is being cut after Che signal 
sequence, hut is not being incorporated Into LG7 
virion; the treatment would also be to insert vgDNA 
between residues 78 and 82 of AA_seq2 . The alternative 
hypothesis is that BPTI could fold and react with 
trypsin even if signal sequence is not cleaved. U- 
terminal anino acid sequencing of trypsin -binding 
material isolated fron cell honogenate detemines what 
processing is occurring. If signal sequence vece being 
cleaved, we would use the procedure above to vary 
residues between C78 and A82; subsequent passes would 
aJc r'--iu.;es a fter . residue 81. If signal sequence were 
,not beinc, cleaved, we would vary residues between 23 
and 27 of AA_seq2. Subsequent passes through that 
process would add residues after 23. 



If BPTI were found ncittier in the peripias.Ti nor on 
30 the inner nenibrano, then we would expect that the fault 
was in the signal sequence or the s igna 1-sequence-to- 
BPTI junction. The troitnent in this case would be to 
vary residues between 23 and 27. 

35 Analytical expcrir^cnts to detemine what has gone 
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wrong take time and effort and, for the foreseen 

outcomes, indicate variations in only two regions. 

Therefore, ve believe it prudent to try the synthetic 

experiments described below without doing the analysis. 

Tor example, these six experiments that introduce 

variegation into the hot l>ncne VTI I fusion could be 
trie 




10 



1) 3 variegated codons between residues 78 and S2 
using oligjiia and olig^l3. 



2} '3 variegated codons between residues 23 and 27 
using oLigirl^ and oligslS, 

15 3) 5 variegated codons between residues 78 and 82 

using olig;i3 and oligSt2a, 



20 



4) 5 variegated codcns between residues 23 and 27 
u:.ing oiigslS and olig.4 14a, 

5) 7 variegated codons between residues 78 and 82 
ar»ing oiig?13 and oligflZb, and 




6) 7 variegated codons between residues 23 and 27 
25 using oligKlS and oiigi*14b. 

To alter the DPTI-M13 CP junction, ve introduce 
Diin variegated at codcns for residues between 75 and 82 
into the I and S_Li I sites of pLG7. The residues 

30 after the last cysteine are highly variable in amino 
acid sequences honolcgous to DPTI, both in conposition 
and length; in Table 25 these residues are denoted as 
C79, C30, and A81. The first part of the M13 CP is 
denoted as A32, E83, and G84. One of the oligo-nts 
oligsl2, oligsl2a, or oligrfl2b and the primer oligTl3 
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are synthcs ized . by standard r.ecnods. The oligo-nCG 
are: 



residue 75 76 77 73 79 80. 81 32 83 
5 ' gc I gag i cCC [ ATC | CCT | ACC j TCC j q f >: | q f k i q f k | CCT | CAA [ - 

BA 35 S6 87 83 30 90 91 
. CCTt CAT I CAT 1 CCC I CCC I A.-w\ I CCC [CCC I qcg ice 3' olig»12 

residue 75 76 77 73 70 SO 81 Ola 31b 
5' gcigaglcCC; ATC|CGT| ACC|TCC[q:>-.lqfk!qCJclqfV:|qfk| - 

15 82 83 84 85 86 -"7 

CCT!CAA|GCTlCATjCAT|CCC| - 
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88 89 90 9 1 
CCC 1 A,-*w\ I CCC i CCC I gcg ice 3' olig--l2a 

residue 75 76 77 78 7? 80 31 31a 81b 
5' gcigag|cCC!ATGlCCT|ACClTCClqf>-.|q£-kiqfk|qfk|qC'k|- 

25 21c 31d 22. 33 8^ -3 5 SG 87 

qfX jqf>:|CCTiCAA|CCT[CATlCAT|CCC; - 

• 53 ' 89 9 1 

CCCi AAA;CCC|CCC|qcgjcc 3' olia2l2b 



35 



residue 01 90 SO SO 37 S'^ 
o' qg;'cqclCCC!CCC|TTTiCCC|CCCi ATC 3' ol 'u}=lZ 



vhero q is a riixture of (0.26 T. O.ISC, 0.20 A, and 
0.30 G) , f is a cixture of (0.22 T, 0,16 C, 0.40 A, .md 
0.22 G) , and k is a nix.ture of equal parts cf T and C. 

'10 The bases chown in lover case at either end rirc spacerr» 
and are not incorporated into the cloned qcrre. The 
;.ri-er is ccr.p lenenta r/ to th% 3' Cf.d of c-ich of the 
longer oligo-nts. One of the variegated oLigo-nts and 
the pri.-or cLig=13 are corrbincd in equir.clar a-ount^* 

45 and annealed. The dcDN'A is completed with aU four 
(nt)TPs and Kler.o-r frag.7.ont. The resultinq dsDN'A and 
RF pLC7 arc cut with both Gfl I and S^h I, puri-t'^ed. 
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nixed, and liqaced. This ligation nixturc goes through 
the process described in Sec. 15 in which select j 
tr-^ns formed clone that, when induced wi':h inz, binds 
AHTrp. 

5 

To vary the junction bctwr.en M13 siqnal seq^jence 
^nd BPTT, wo introduce Ci'.'A variolated at ccdons t'or 
residues bcf-een 23 and 27 into tne Kon I and Xho I 
sites of pijC7, The first three residues are ;;iihly 
10 variable in anino acid sequences hcQoloqous to BPTI. 
Hcnoiogous sequences also vary in length at the amino 
terminus. One of the oligo-nts olig = 14, cliq = l-la, or 
oligTl4b and the prir.cr olig = lS ore syr.thesized by 
standard methods. The oligo-nts are: 

15 

residue : 17 13 19 2 0 21 2 2 2 3 2 4 2 5 

5' g|gcc|gcClGT.\|CCClATC|CTGlTrr!TTr;cCTiqfK;q:>.i- 

20 26 2 7 23 25 30 

jqfklTTC;TGT!CTC?CAC|cgcjccg!cqa' 3' olic-l4 

residue IT 13 10 20 2 1 22 23 24 ;.5 26 
^> ' g i gcc 1 qcC | CTA ; CCG i ATG I CTC 1 TCT j TTT ; GOT | q 5 :< • q 1 >: ; cf :< j - 

26a 26b 27 28 29 30 
I qfk I qfk I TTC I TGI • CTC j GAG jcgr. I ccgir;:3a; 3' oI:g--l-;a, 

30 



residue 17 13 19 2 0 2 1 2 2 2 3 2^ >.i 2 6 
5'g;gccigcGiCTAiCCGl ATClCTClTCTlTTTlCCTiqfkjqrhiqfK! - 

35 

26a 26b 26c 26d 27 23 29 30 
iqfklqfklqfhlqfV.fTTClTGTICTClCAGlC-lC-^cglcqai 3 'oliq^l^b 

40 5' I tcg|cgn|gcg|CTC|GAGlACA!C^^! 3' cliq^lS 

where q is a nixturc of (0.26 T, 0.13 C. 0.26 A, and 
0.30 G) , f is a Dixturc of (0.22 T. 0.16 0.40 A, and 



o 



205 

0.22 C) t and k is a mixture of equal parts of T and G. 
The b^ses shown in lownr case at either end arc splicers 
and are not incorporated into the cloned gene. One oi' 
the variegated oligo-nts and the priner ore combined in 
5 equinolar anounts and annealed. The ds DNA is 
completed with all four (nt)TPs and -Klcnow frag;r.ent. 
The resulting dsDIJA and RF pLG7 are cut with both Kan I 
and Xho I, purified, nixed, and ligated. This ligation 
mixture goes through the process described in Sec. 15 
10 in which we select a transfcmed clone, that, when 
induced with IPTC, binds AKTrp or trp. 

Other nucbcrs of variegated codons could be used. . 

15 If none of these approaches produces a working 

chimeric protein, we ray try a different signal 
sequence. If that doesn't vor.k, we may try a different 
OSP in M13 because the structural data clearly indicate 
that BPTI could net be joined .to the carbo.^y teminus. 

20 The next best OS? of X13 is the gene III protein 
because there is fusion data (SMITSS, CRlf288) . 



Ex^rnle I. Part IT 

BPTI binds very tightly to trypsin 
{K^ = 6.0 X 10"-"^ M) and to a nhydrotrypsin , so th-:-t 
these molecules arc not preferred for cptimizinq the 
anount of BPTI to display on LG7 or the amount of 
affinity nolecule to attach to the column. Tschesche 
et al . reported on the binding of several DPTI 
derivatives to various proteases: 
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riaticn cor.sc^nts for riPTI derivatives, Mclar. 



Disc 

isidue Trypsin Cliynot ryps in Elascasc 

315 (bovine (bovir.c (porcine 

poncreas) pancreas) 



lysine 
glycine 
alanine 
va 1 ine 
leucine 



Elactase 
(hur.an 
pancreas) leukocytes) 



6.0 



X lO"^-* 9.0 X 10"^ 



3.5 X 10 



7.0 X 10" 



-6 



2.8 X 10 



-8 



2.5 X 10" 



5.7 X 10" 



1-. 1 xlC 



-10 



1.9 X 10 



-a 



2.9 X 10 



-9 



10 



1 



15 



20 



25 



From the report of Tschcscho et al . ve infer that 
molecular pairs nar^icd have K^s greater than 

3.5 X 10"^ M and t^.at -olecular pairs narked have 
K(jS much greater than 3.5 x 10'^ K. Because of the 
wealth of data about the binding of nPTI and var:ov:s 
mutants to trypsin and oth^r proteases (TSCH37) , we c^n 
proceed in various ways. (Tor other PBDs ve can cbt.iin 
two different r:onoclonal antibodies, one with a high 
affinity having pi order 10"^* .M, and cnu with a 
moderate affinit/ having Kj cn the oraer of 10"^ M.) 
In this example, vc r.av use: the -ode rate binding 

between BPTI and hu-an Icukccyre elastase (iiuLEl). b) 
the moderately strong binding of porcine elastase to 
8PTI(V15), or c) the binding of BPTr(;\15) (residu-- 3d 
in the ebd gene) tcr trypsin (wca)c but catectablel or 
for porcine pancreatic elastase. 



Following the teachings of Sec. 10, ve co.-paro the 
retention of LC7 virions to the retention of wild-type 
30 M13 on (AHTrp). M13 derivatives having nore CS'A than 
wild-type M13 have cor rospcnd ing Longer virions. Thus 
we will create ^ICS that differs from pLC7 only in 
having stop codons at codons 2 and 3, and an altered L 
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codon at codon 7 of c^.c C5;p--ir^j gene. Phaqe LG8 will 

have exactily as much DMA as IC7 ; therefcre the LC8 

virion is exactly as long as the LG7 vii-icn. LG8 can 

not, howewer. display BPTI on its surface. To generate 
5 these mutations ve synthesize the oligo-nt 

5' (12 1) laac|gctiagc|ctt|Cag|aaclcag|aga! ttAlctAlcatl- 
laqtiaaqIcctt (30) 3' olig=ll 



that is conplenentary to bases 80 through 121 of the 
Ipbd gene, shown in Table 23. except for the three 
upper-case, underscored bases. Olig=ll and the 
priners olig;*24, olig = 25; and oliq = ?6 are annealed to 
circular ssDNA from LG7. .Klenou fraqr.cnt (from US 
Biochemical) and all four (nt)Trs arc used to complete 
the circular dsONA. After treatr.ent ^'ith Kienow 
fragment, the dsDNA is treated with ligase. .Cells are 
transcorined with the ligatod dsCNA and. after a short 
gro-out. the cells are plated on ar.picil lin-containing 
LB agar. By changing the third base m codon 7. ve 
have destroyed the u::ique Afl II site in pLCT. Thus -e 
can screen colonies for los.*; cf the ^ 1 1 site. To 
ccnfirn the construction. Cr.'A from plaques with no 
Afl ir site are sequenced fror- about base :-;o to about 
base 4 0 of the c so-icbd gene. 

To expedite identification of different M13- 
derivod phage, we replace t^e aro^ gene of LC3 with the 
tet^^ gene from pr;R322 . Plasr.id pBR3:2 is cut with 
Dsn I at the unique site at 1353 and the linearized CNA 
is blunted with Klcncv fr^ag-ent and purified. The 
blunt DNA is cut with A.Tt II and the l.;:3-base tet^- 
bearing fragment purified by agarose qel 
■ electrophoresis or HPLC. Plasmid pLG3 ds DNA is cut 
with ZhA r the unique site and the linearized CN'A is 



m 



m 



1^! 




o 



203 

bluncod vith Klcno--- traqntcnc c\nd purified. The L;r.e.T:\ 
fclunc 0:.'A is digested with A.^ t: 11 and the 7.3 
Craqmcnc is isolated. The f-o isolated ONA fraqr.ents 
are mixed, annealed, lighted, and used to transfcm 
5 competent col i cells'. The transfomed cells are 

selected with tetracycline- The correct cons tri:ct ion 
contains SjlI I. KcoR 1, and EcoR v sites, but LC3 
contains none of these. The correct cons:: rue t icn . 
having 9:2 kb, is easily distinguished trom p'iR::.^ ar.d 
10 is called LCIO. C::A from phage LCIO is seqv:enced in 
the vicinity of the junctions of the newly inserted 
tet ^ gene to confirm the construction. 

The phage LOT is grown at various levels o: I PTC 
15 in the medium and harvested in the way previously 
described. An affinity colunn having bed vclu-e of 2.0 
ml and supporting an acount of KuLEl pic):ed from t.^.e 
range C.l ng to 30.0 ng on 1 ml of Bi-P.ad A;:i- 
Gel 10 ("^-^i or Affi-Gel 15 i'^'"^^ is designated (n-j:,i:i;. 
20 An appropriate sot of densities of MuLEl on tr.e -vol'-.-n 
is (0.1 .-g/ml, 0.5 mg/nl, 2-0 r.g/ml. 3.0 .-'C/ml, 15. D 
mg/ml, and 30.0 mg/nl) . The v.; of -;HuL£i! is, ry 
hypothesis, 1.0 ml. The elution of LOT p.^ace is 
compared* to the eluticn of LCIO on {liuLElJ r.avi.-.g 

2 5 varying amounts -of liuLEl affixed. The colu.-T^ns aro 

eluted in a standard way: 

1) 10 mM KCl buffered to pH 3.0 wiih phosohi-e, 
until optical density at 2S0n- falls to base li.-.e 

3 0 or 4 X Vy, whichever is first, 

2) a gradient of 10 mM to 2 M KCl in 3 x v,;, pH 
held at S.O with phosphate, 

3 5 3) a gradient of 2 >\ to 5 M XCl in 3 x V-/. 



phosphate buffer to pH 3.0, 



10 



15 



4) constant 5 M KCi plus 0 to 0.8 M quanidiniua Cl 
■in 2 X V^, with phosphate buffer to ?H 8.0. 

The preferred level of induction ( I PTGopt imal ) 
amount of affinity nolecule on the matrix 
(DoAMoMcptinal) ^ '^^'^'^^ settings that , give the 
sharpest LC7 oluticn peaK t.^at shovs significant 
retardation as cor^pared to LG3 , which carries no BPTI. 
By hypothesis, the best separation occurs for the 
amount of BPTI/CP produced when the cells are induced 
with 10.0 uM IPTG and when 4.0 ng HuL£l/ni is applied 
to BicRad Affi-Cei loC^'"^). 



When the a-cunt of SPTI/C? and the amount of 
HuLEl/volur.e of support have been optir.ized, we turn to 
optimizaticn of elution rate, initial ionic strength, 
and the a.-ount of GP/(volu-e of support). These 
20 parameters can be optimized separately. 

asi.-g cptinal 2PTI/CP and Hu LEl/vo lu.-:\e of support, 
we measure the elution voluze of LG7 and LGO for 
different elution rates, vi.^. 1, 1/2, 1/4, 1/3 and 1/16 

25 times the caxinun flow rate. M13 is shear resistant, 
so that the pressure that can be applied across the 
column is ILnitcd only by the r.ecnanical properties of 
the support material. By hypothesis, !/■! of maxisum 
elution rate is better than 1/2, but 1/3 is about the 

30 same as 1/4. Therefore 1/-; .taximum elution rate will 
be used. 



Elution volu-es of LG7. obtained frc::i cells grown 
on media that is 2.0 rJI in IPTG are measured at optimal 
35 DoAJIoM and eluticn rate for leadings of 10^, 10^°, 
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10^^, and 10^2 pf^_ 5^ hypothesis, 10^2 p^^^ p^^^.^ 
LG7 overloads the colur.n and significant number of 
phage elute before their characteristic position in the 
KCl gradient. We also find that 10^^ pfu overloads the 
colunn only slightly, and that 1010 pfu does not 
overload the column. Because the use of the affinity 
separation in Sec. 15 will involve a population in 
which no single mer.ber is r.ore than one part in 10**, we 
conclude that 10^^ pfu of a variegated population could 
be applied to a colunn of 1.0 ml matrix volume without 
overloading with respect a.ny one species. The 
overloading of a 1.0 ml colunn by 10^^ . pfu also ' 
indicates that the initial column that captures 
indiscriminately adhesive phage should be 5 to 10 times 
as large as the column that supports the target 
naterial . 



Elution volumes of LG7 and LGIO obtained from 
cells gro'-n on media that is 2.0 irJI in I PTC are 
measured at optimal DoAMoM and elution rate and for a 
loading of 10^*^ pfu for vai'ious initial ionic 
strengths: 1.0 mM, 5,0 r-'t, 10.0 r-M, 20.0 vJi, and 50.0 
m-M. We find that LGIO is slightly, retarded by the 
column when loaded at 1,0 rCi KCl, but that LG7 always 
comes off the column at its characceristic place in the 
gradient. We use 10.0 r.M as initial ionic strength in 
all remaining affinity separations. 



To determine the sensitivity of chromatography of 
30 pliage that display variants of BPTI on their surfaces 
(Sec. 10.1), we prepare artificial mixtures of two 
closely-related phage that differ only at one residue 
in the BPTI domain. One variety of phage has strong 
affinity for the colunn used in this step, while the 
35 other phage has no affinity for the column. We 
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chronatograph these ni'xturcii to discover how little of 
the phage that binds to the column can be detected 
within a large majority of phage that do not bind the 
column. 

For these tests we choose AiiTrp as AfMCBPTr) . A 
column having 2 nl bed volume is prepared with 
(DoAWoMoptinal ^9 °f A[ITrn)/(nl of ' Af f i-Cel 10 ( ) . 
The column is cal led , ; AHTrp ) and has Vy = 1.0 nl. 

A new phage, LC9 , is prepared that displays 
BPTICV15) as IPBD in contrast to LC7 that displays 
BPTI(K15, wild-type) as IPBU. Residue 15 of 3PTI is 
residue 38 . of the oso-iobd gone. We introduce the 
change K38 to V by replacement of a short segment of 
the oso-ipbd gene. The two oligo-nts 
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4 0 


4 1 


42 
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TGt 


gtt 


GCt 


CCt 


ATa 


ATa 


CCC 


TAT 1 TTC 


cc 




acA 


C^.A 


cgA [ gcA 1 taT 


tar 


gcG 


az^ i aag 


Aoa I 




(BssH II) 
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51 


TAC 


AAC 


CCT 


AAA 


CCA 


CG 


atg 


ttg 


cga 


ttt 


cgt 


cc 



3' olig-16 
5' oligrl7 



Stu T 



are synthesized by standard methods and annealed; the 
lower case letters in olig=16 and the upper case 
Letters in olig?17 arc mutant with respect to pLG". 
Plasmid pLC7 DNA is digested with both Apa I and Stu I 
and the large piece purified. The ds oligo-nt is added 
to the purified backbone of pLG7 and ligated; the 
ligated DUA is used to transform competent cells. 
After a short grow out, . the cells am plated on 
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anpicill in-containing plates and Amp^ colonies are 
picked. The nutations destroy the unique BssH II site, 
thus ue can screen colonies through restriction 
digestion. To confirm the construction, DKA froai 
colonies having the correct restriction digestion 
pattern is sequenced from about 10 bases above the 
5fj I site to about 10 bases below the Aoa I site. The 
correct construction is called pLG9. 

To expedite differentiation between LG7 and an 
LC9-derivat ive phage, ve replace the anp ^ gene of LC9 
with the tqt ^- gene from p3R322. Plasrnid pBR3 2 2 is cut 
with Bsn I at the unique site at 1353 and the 
linearized CI.'A is blunted with Kienow fragr.ent and 
purified. The blunt D:."A is cut with Aa t II and the 
142C-base tet^-bearinq fragr.ent purified by agarose gel 
electrophoresis or HPLC. Plasmid pLC9 ds DNA is cut 
with Xh.-^ I at the unique site and the linr^arized DtM is 
blunted with Kienow fragr.ent and purified. The linear, 
biunt DI.'A is digested with Aa t II and the 7.8 kb 
fragment is isolated. The two isolated Dt»A cragncnts 
are r\i'Acd, annealed, liqated, and used to transfom 
cor:petent col i cells. The transformed cells ai.-e 

selected with tetracycline. The corrpjct construction 
contains F.a 1 I, F.coR I, and HcoR V sites, but LC9 
contains rone of these. The correct construction, 
having 9.2 kb, is easily distinguished froni pBR322 and 
is called LCll. Dr.'A fron phage LCll is sequenced in 
the vicinity t.he- ■ j unctions of the newly inserted tot ^ 
gene to confinz the construction. 

LC7 and LGll arc grown with optir.un I PTC (2.0 mM) 
and harvested. .Mixtures arc prepared in the ratios 
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LC7: LCll 
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vhere Vj^j^^ ranges fron 10^^ to LO^ by factors of 10. 
Large values of Vj^^^^ are tested first; once a *^iirn 
found that a Hovs . recovery of LC7 , ..sraaller values of 
Vj^;j3 arc not te tested. Once a value of V^^^j is found 
that allows recovery of LC7, we test values that are 
larger by 2-, or 3-fold so thac j^j^ is deL^rrained 

within a f accor of 2 . 

The colunn (AHTrpI is first blocked by treatment 
with 10^^ virions of M13ars29 in 100 ul of 10 nH KCl 
buffered to pH 3.0 with ph:;sphate; the column is washed 
with the sa.T.e buffer until OD260 returns to base line 
or 4 X have passed through the column, whichever 

co=es first. One of the mixcures of LG7 and LCll 
containing 10^^ pfu in 1 nl of the same buffer is 
applied to (AHTrp). The cclunn is eluted in a standard 
way : 

1) 10 r-X KCl buffered to pH 3.0 with phosphate, 
until cptic.21 density at 230n:n falls to base line 
or 4 X Vty, whichever is first, (discard effluent), 

2) a graziient of 10 r.'I to 2 M KCl in 3 x v-^, pH 
held at 3.0 with phosphate, (30 x 100 ul 
fractions) , 

3) a gradient of 2 M to 5 M KCl in 3 x Vy, 
phosphate buffer to pli 8 .0, (30 x 100 ul 
fractions) , 



4) constant 5 M KCl plus 0 to 0.3 M guanidiniun Cl 
in 2 X V-^, with phosphate buffer to pH 3.0, (20 x 
100 u 1 f ract ions ) , 



p. M 

5) constant 5 li KCl plus 0.3 M quanidiniun Cl ih 
1.2 X Vy, with phosphate buffer to pH 8.0, (12 x 
100 ul fractions) . 



Sar.ples of 4' ul fron each fraction are pitted at 
suitable dilution on pr.ogc-sens i t ive Sup"*" cells (so 
that M13'an4^9 will net grow). In addition to the 
effluent fractions, a sarple is removed from the column 

10 and used as an inoculur. for phage-sensi t ive Sup* cells. 
Plaques are transferred to anpicillin-containing L3 
agar. Colonies that are a.-npicillin-resistanc are 
tested for display of 3?TI(:<15) by use of trp* or 
AKTrp*. Tes'cing begins vlth colonies obtained by 

15 culturing an inoculun f ron the column, proceeds to the 
last effluent fraction, and works backwards toward 
earlier fractions. Once a positive colony is found, no 
further tests are required for that value o^ Vj^^^-. If 
no BPTI positive colonies are detected, the population 

2u of phage obtained from the column matrix and the last 
few ( e.g. 5 to 10) phagc-bearing fractions are acrged 
and cultured. Phage are harvested from this culture 
and chroma tograrhed by the above procedure. This 
.process continues until a positive colony is isolated 

25 or N^hrom parses of chro.-nn tography and growth have been 
completed. If no positive colonies are detected after 



chrom 



passes of enrich.-ent, Vlim is reduced by a 



I 



suitatie factor and the process is repeated. 
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By hypothesis, V. = 4.0 x 10^ is the largest 
value for which LC7 can be recovered. Thus C.^gp^i = 
4.0 X 10^. Three cycles of chromatography are required 
to isolate LG7, so the first approx ina t ion to C^ff is 
740 ( = exp( lcge(4.0 x 10^)/3 ) ). 
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We now deterainc the efficiency of the affinity 
separation (Sec. 10.2). This is done by: a) preparing 
nixtures of LC7 and LGll in- the rotio 1:Q, b) enriching 
the population fcr LG7 for one separation cycle, and c) 
dctemining the fraction of LC7 in the lasf phage- 
bcarinq fraction. The phage are obtained from cultures 
induced ac 10. 0 uM IPTG, the optinal level. Q is 
decreased uncil roughly half the phage^are LP7 . We 
start with Q = 1.5 x 10^* = 20 x apprcxinate C^ff. The 
mixture is applied to a (AHTrp} colunn bearing 4.0 rag 
AHTrp on 1.0 nl of Affi-Cel 10 (the optimal DoAMoM) and 
cluted in the specified manner. A sample of 4 ul frorn 
each fraction is plated at suitable dilution on phage 
sensitive cells on LB agar. The identity of colonies 
in the last phage-bcar ing fraction is deterTiined by 
transferring colonies to ampici 1 1 in-conta in i ng and 
tetracycline-containlng plates; colonies that sncv Tet^ 
are from L>311 and colonies that show Ar.?^ are fron LCI, 
When Q is 1.5 X 10^.. of colonics are BPTI pcsitive^ 
When Q is 1.5 x 10^, oO^ of the colonies are BPTI 



positive. Thus ve calcuUt.^ C^ff 
900. 



.60 X 1.5 lO-" 
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Myoglobin. is strongly colored and it is possible 
.that binding of UIIKb to M13 could provide enough 
optical absorption to allow FACS sorting ct M13 that 
bind HH.^-lb (See Sec. 10.4). 

We have ncv constructed 1X37 that displays one or 
more BPTI do.T.ains on coch virion. The oso-ipbd gene is 
under control of the l^c^VS promoter so that expression 
levels of Df-TI-r:i3 CP can be manipulated via [IPTG]. 
This construct r.ay be used to develop nany different 
binding proteins, all based on BPTI. An optimum level 
of induction has been dctcrnincd. An optimum .-mount of 
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AfM(PBD) = DoAu^oMcpcimu.-a = 2.0 rr.g/Cral of support; has 
been detemined; cargeC ncLeculas will be applied to 
colunns at this level in the process disclosed in Sec. 
Ib.l. These optimum levels may be adequate for all 
5 tarqets and <ill va r icrj^'c ions of BKTI displ-iyed on 
derivatives of K13 based on LC7, tut sonc further 
optimization r-ay be needed if other values of pX or 
tenperatures are used. 

^0 Other £h£l gene fragments may be substituted for 

the bpt i gene fragnnint in pLG7 with a high liVielihcod 
th^t PBD will appear on the surface of the new LCI 
derivative . 

1 5 'tL X^nol*? 1 . Pn rt III 

HHMb is chosen as a typical protein tar-jctr any 
other protein could be used. HHKb satisfies all of the 
criteria for -i target: 1) it is large enough to be 
20 applied to an affinity tnatrix, 2) after attach.-.;on t it 
is not reactive, and 3) after attachment there is 
:;ufficient unaltered surface to allow specific binding 
by PBOs. 

25 - The essential information for HiiMb is known: 1) 
HHMb is stable at least up to 70^C, Letveen pM i.4 and 
9.3, 2) HHMb is Stable up to 1.6 H Cuani:Jiniuir. CL, 3) 
the pr of HIiHb is 7.0, 4) for HHMb, Mj- = l'>,noo, 5) 
HHMb requires haen, 6) HI!:<b has no proteolytic 

30 activity. 

In addition, the following information about HHMb 
and other myoglobins is available: 1) the sequence of 
IlHMb is known, 2) the ':D structure of spcr.-n whale 
35 myoglobin is known; HHMb has 19 anino acid differences 
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and it is gener? lly * assumed that the 3D structures arc 
almost identical. 3) HHMb has no enzymatic activity, 4} 
HHMb is not toxic. 

We set the specifications of an SBD as : 

1) T = 25°C 

2) pH = 8. 0 

3) Acceptable solutes : 

A ) for binding : 

i) phosphate, as buffer, 

ii) KCl, 10 ruM, 
B ) for column olution : 

i) phosphate, as buffer, 0 to 30 nuM, 

ii) KCi, up to 5 M, and 

iii) Guanidinium CI, up to 0.8 M. 



0 to 2 0 nuM, and 



■B 



4) Acceptable < 1.0 x 10 
We choose LG7 as CP(ir3D) . 

As stated in Sec. 13. 'u the residues to be varied 
are picked, in part, through the use of interactive^ 
computer graphics to visualize the structures. In this 
-section, all residue numbers refer to BPTI. We pick a 
set of residues that forns a surface such that all 
residues can contact one target nolecule. Information 
that we refer to during the process of choosing 
residues to vary includes: 1) the 3D structure of BPTI, 
2) solvent accessibility of each residue as computed by 
the nethod of Lee and Richards (LEEB71), 3) a. 
compilation of sequences of other proteins homologous 
to BPTI, and 4) knowledge of the structural nature of 
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different .ir.ino acid types. 
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Tables 16 and 34 indicate which residues oC B?TI; 
a) have substantial surface exposure, and b) are known 
to tolerate other anino acids in other closely related 
proteins. We use interactive co-puter graphics to pick 
sets of oiqht to twenty residues that are exposed and 
variable and such that all r.e-bers of one set c^n Couch 
a molecule of the target naceridi at one ti-e. If BPTI 
has a s.-all aaino acid at a given residue, that anino 
acid nay not be able to contact the target 
simultaneously vith all the other residues in the 
interaction set, but a larger Aiaino acid night well 
xake contact. A charged acino acid night affect 
binding without snaking direct contact. In such cases, 
the residue should be included in the interaction set, 
with a notation that larger residues night be useful. 
In a sir.ilar way, large ar.ino acids near the geo:netric 
center of "he interaction set nay prevent residues on 
either side of the large central residue from making 
sinultai'.cous contact. If a s-.=vll amino acid, however, 
were substituted for the larqc anino acid, then the 
surface wcLild beco.no flatter and residues on either 
side could -ake sinultaneous ccntact. ' Such a residue 
should be included in the interaction set with a 
notation that small a.nino acids =ay be useful. 



Tabic 35 vas prepared trcn standard -odd parts 
and shows the naxirnim span between C^^q^^ and the tip of 
each typo of side group. ^^tcta 
rigidly attached to the protein r.ain-chain 
about the Caipha'Cbcta ^o^i^ 



degree of freedom for determining 
side group. 



is used because it is 
rota t ion 
the ir.ost important 
the location of the 
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Table 3; indicates five surfaces that -eec Che 
given criteria. The first surface co^nprises cne set of 
residues that actually contacts trypsin in the coniplex 
of trypsin with 8PTI as reported in the Sroolchaven 
Fronein Data Bank entry ■UTPA". This set is indicated 
by the nuir.ber "l'*. The exposed surface of . the residues 
in this set (taken frori Table 16) tot;.>s 11-18 . 
Although this is not strictly the area o: contact 
between BPTI and trypsin, it is aporoxir^ately tr.e same. 

Other surfaces, nur.bercd 2 to 5, were picked by 
first picking one exposed, variable residue and then 
picking neiqhooring residues until a surface was 
defined. The choice of sets of residues shown in Table 
34 is in no way exhaustive or unique; other sets of 
variable, surface residues can be picked. Sot n is 
shown in stereo view. Figure 10, including the alpha 
carbons of nm, the disulfide linkages, and the side 
groups of the set. We take the or :e.-.ta t ion of oPTI in 
Figure 10 as o standard orientation, and hereinafter 
refer to K15 as being at the top of tne .-molecule, while 
the carbo>:y and amino tcr-ini are at the bottcn. 

Solvent accessibilities are useful, easily 
tabulated indicators o: a residue's exposure. Solvent 
accessibilities nust be used with sor.e caution; snail 
a'nino acids are under-roprosented and large ar.ino acids 
over-represented. The user nust consider what the 
solvent accessibility of a different amino acid would 
be when substituted into the structure of Bm. 

To create specific binding betvoon a deriv.-tive of 
EPTI and uny.i) , we will vary the residues in set 
This set includes the twelve pri.noipal residues 17 (R), 
19CI), 21CY). 27(A). 2S(G), 29(L), 31lQ). 32(7), 34(V), 
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48(A}, 49(E), and 52 (K) (Sec. 13.1.1). None of t^e 
residues in sec kZ is. conpleccly conserved in Che 
sample of sequences reporcod in Table 34; thus we can 
vary thiin with a high probability of rotaininq the 
v.nderly int} structure. Independent substitution at each 
of these twelve residues of the amino acid types 
observed at that residue would produce approx inately 
4.4 X 10^ amino acid sequences and the same nuaber ot' 
surfaces . 

3PTI is a very basic protein. This property has 
been used in isolating and purifying DPTI and its 
honologues so that the high frequency of arginine and 
lysine residues nay reflect bias in isolation and is 
not necessarily required by the structure. Indeed, 
SCI-III fro.-Tv Donfcyx ror i contains seven more acidic 
than basic- groups (SASA84). 

Residue 17 is highly variable and fully exposed 
and can contain R. A, Y. M, F. L, M, T, Y, P, or 
S, All types of amino acids are seen: large, snail, 
charged, neutral, and hydrophoolc. That no acidic 
croups are observed may be due to bias in the sar.ple. 



Residue 19 is also variable and 
containing P, R, I, S, K, Q, and L. 



fully exposed. 



Residue 21 is not very variable, containing F or Y 
in 31 of 33 cases and I and w in the remaining cases. 
The side group of Y2 1 fills the space between T32 and 
the main chain* of residues 47 and 48, The OH at the 
tip of the Y side group projects into the solvent. 
Clearly one can vary the surface by substituting Y or F 
so that the surface is either hydrophobic or 
hydrophilic in that region. It is also possible that 



b.:--v;.v 



m 

1 



i 

m 

i 




221 



..arophc.ics (L. M. or V, ni,h. be tolerate.. 

,,.o Observed. On structural grounds, this 

acid and perhaps any a:,ino acid. 

r in DPTI. This residue is in a 

turn, but is not m observed at 

3. other types oea^no acids^ have been ^ ^^^^^^ 

this residue- K. K Q B ^^^^^ simultaneously 

- this residue ..ight not ^^^^ ^^^^^^ ^^^^^ 

with residues 17 and 3A. 13 q ^^^^^ues 17 and 

.teract .ith H.,Hb at th s^^^ — ^ 

on rs-.ace defined by the other 
binding of Hmb. on the 

residues of the principal set. ^ny amino 
perhaps P, should be tolerated. 

residue 2^ 1= highly variable. .osV often 
containing L. This fully ^^^^^ ^^^.^^ 
ptobably tolerate alr.ost any aai.no ac.d 

perhaps, P. 

Residues 31. 32. and 3. are highly variable^ 
-e.pos!" and in extended confor^-ations.- any amino acid 
should be tolerated. 

residues .B and <9 are also highly variable and 
f.Uy exposed, any anino acid should be tolerated. 

Residue .2 i= in - alpha heli.. A-V amino acid. 
. except perhaps P, night be tolerated. 
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Now we consider possible variation of the 
secondary set {Sec. 11.1.2) of residues that are in the 
neighborhood of the principal set. Neighboring 
residues thnt might be varied at later stages include 
5 9(P), 11(T), 15(K), 16(A), 13(1), 20(R), 22fF), 24 (M) , 
26 {K) , 35(^), 47(S), 50(0), and 53 (R) . ' 

Residue 9 is highly variable, extended, and 
exposed. Residue 9 and residues AB and ^9 are 

10 separated by a bulge caused by the ascending chain from 
residue 31 to 34. For residue 9 and residues 48 and 49 
to contribute simultaneously to binding, either the 
target must have a groove into which the chain froci 31 
to 34 can fit, or all three residues (9, 49, and 49) 

15 oust have large amino acids that effectively reduce the 
radius, of curvature, of the BPTI derivative. 

Residue 11 is highly variable, extended, and 

exposed. Residue 11, like residue 9, is slightly Car 

20 froc the surface defined by the principal residues and 
will contribute to binding in the same cir 



Residue 15 is highly varied. The side group of 
residue- 15 points away form the face defined by set =2. 
25 Changes of charge' at residue 15 could affect binding on 
the surface defined by residue set s2. 

Residue 16 is varied but points away from the 
surface defined by the principal set. Changes in 
30 charge at this residue could affect binding on the face 
defined by set 32. 

Residue IS is I in BPTI. This residue is in an 
extended confornation and is exposed. Five other amino 
35 acids have been observed at. this residue: M, F, L, V, 
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and T. Only T is hydrophilic. The side group points 
directly away fron the surface defined by residue set 
K2, Substitution of charged amino acids at this 
residue could affect binding at surface defined by 
residue set 42. 

Residue 20 is R in BPTI. This residue is in an 
extended con forriacion and is exposed. Four other anino 
acids have been observed at this residue: A, S, L, and 
Q. The side group points directly away froa the 
surface defined by residue set ;2. Alteration of the 
charge at this residue could affect binding at surface 
defined by residue' set n2. 
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Residue 22 is only slightly varied, being Y, F, or 
H in 30 of 33 cases. nevertheless. A, N, and S have 
been : observed at this residue. Amino acids such as L, 
H, I, or Q could be tried here. Alterations At residue 
22 may affect the niobility of residue 21; changes in 
charge at residue ' 22 could affect binding at the 
surface defined by. residue set =2. 

Residue 24 shews so-e variation, but probably can 
not interact with one r.olecule of the target 
simultaneously with all ti-.e rcr.idues in the principal 
set- Variation in charge at this residue might have an 
effect on binding at the surface detined by the 
principal set. 



30 Residue 26 is highly varied and exposed. Changes 

in charge nay affect binding at the surface defined by 
residue set j?2; substitutions may affect the mcbiiity 
of residue 27 that is in the principal set- 



25 



Residue 3 5 is most often Y, W l*.as been obser^*ed. 
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The side ^^coKip of 35 is buriod, but subscitution of F 
or W could affect the mobility of residue 2<i , 

Residue 47 is always T or S in the sequence .sanple 
5 used. The Ogaptj-^ probably accepts a hydrogen bond fro=t 
the NH of residue 50 in Che alpha helix. Nevertheless, 
there is no ovnr^hc Ini ng steric reason to preclude 
other amino acid types ot this residue. , In particular, 
other amino acids the side groups of which can accept 
10 hydrogen bonds, viz . M, D, Q, and E, may be acceptable 
here . 

Residue 50 is often an acidic amino acid, but 
other anino acids are pcssiblo. 

15 

Residue 53 is often P., but other amino acids have 
been observed at this residue. Changes of charge nay 
affect binding to the ar.ino acids in interaction set 
S2. 

2C 

Stereo Figure 10 shows the residues in set ^2, 
plus R39. From Figure 10, one can see that R'JO is on 
the opposite side of DPTI form the surface defined by 
the residues in set =2. Therefore, variation at 
2.5 residue 39 at the same tine as variation of sone 
^residues in set t2 is much less likely to inprove 
binding that occurs along surface #2 than is variation 
of the other residues in sec ^2. 

30 In addition to the twelve principal residues and 

13 secondary residues, there are two other residues, 
30(C) and 33 (K), involved in surface ;2 that we will 
probably not vary, at least not until late in the 
procedure. These residues have their side groups 

35 buried inside DPTI and are conserved. Changing these 
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residues does not change the surface nearly so much as 
does changing residues in the principal set. These 
buried, conserved residues do, however, contribute to 
the surface area* of surface j< 2 . The surface of residue 
5 set 1(2 is comparable to the area of the' trypsin-b ind i nq 
surface- Principal residues 17, 19, 21/27, 23, 29, 
31, 32, 34, 48, 49, and 52 have a combined solvcnc- 
accessible area of 046.9 . Secondary residues 9, 11, 
15 , 16. 18, 20, 22, 24, 26, 35, 47, 50, and 53 have 
10 combined surface of 1041.7 A^. Residues 30 and 33 have 
exposed surface totaling 33.2 A^. Thus Che three 
groups' combined surface is 2026.8 A^. 

Residue 30 is C in BPTI and is conserved in all 
15 homologous sequences. It should be noted, however, 
that C14/C38 is .jonser^/ed in all natural sequences, yet 
Marks et a 1 . (MARK07) shoved that changing both C14 and 
C3B to A, A or T,T yields a functional trypsin 
inhibitor. Thus it is possible that BPTI-like 

20 niolecules vill fold if C30 is replaced. 

Residue 33 is F in BPTI and in all hor.oloqcus 
sequences.* Visual inspection of the BPTI structure 
suggests that substitution of Y, M, H, or L night be 
25 tolerated. 
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Uaving identified t'wcnty residues that define a 
possible binding surface, wc r.ust choose some to vary 
first. Given our hypothetical affinity separation 

30 sensitivity, Cs^pj^j^, we decide to vary six residues 
leaving sone margin for errors in the actual base 
composition cf variegated bases. To obtain maxi.-nal 
recognition, we choose residues from the principal set 
that arc as far apart as possible. Table 36 shows the 

35 distances between the beta carbons of residues in the 
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principal and peripheral- set. R17 and V34 are at one 
end of the principal surface. Residues A27, C28, L23, t;. 
AA&, and M52 are at the other end, about fwenty 



Angstroms away; of these, we will vary residues 17, 27, 
5 29, 3-;, and 48. Residues 28 , 49, and 52 will be, varied 
at later rounds. 

Of the remaining principal residues, 21 is left to . 
later variations. Among residues 19, 31,. and 32, we f}^ 
10 arbitrarily pick 19 to vary. ^ 

t 

Unlinited variation of six residues produces 6.^ x ^ 
lo"^ amino acid sequences. 3y hypothesis, C^gnsi 1 ^/ 
in 4 X 10^. Table 37 shows the programmed variegation p- 
15 at the chosen residues. The parental sequence is ^ 
present as 1 part in 5.5 x lo"^, but the least favored 



sequences are present at only 1 part in 4.2 x 10 . y,: 
Anong single-anino-acid substitutions frcr. the PPBD, 

the least favored is F17-r 19-A27-L29-V34-A43 and has a j^-^.^ 
20 calculated abundance of 1 part in 1.6 x 10^. Using tMe 
optinal qfi-; codon, we can recover the parental sequence 
and all one-anino-acid substitutions to the PPBD if 
actual nt compositions cone within 5\ of progran-.rr.ed 
compositions. The number of trans fornonts is Mntv ~ 
25 1.0 X 10^ (also by hypothesis), thus ve will produce 
aost of the programmed sequences. 



^ - 



The residue numbers of the preceding section are p.- 

referred to nature BPTI ( R1-P2 - . . . ) . Table 25 has ^ 

30 residue nur.bers referring to the pre-.M 1 3CP-DPTr p.. 

protein: all mature DPTI sequence numbers have been ii. 
increased by the length of the signal s^^^quence, liG,-. • 

23. ThU3 in terms of the prc-OSP-PaO resioue numbers, f*. 

we wish to vary residues 40 , 4 2 , 50 , 52 , 57 , and 7 1 . A p.: 

35 DliA subsequence containing all these codons is found f. 
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betveen the f Apa I/ Dra II/ Psc I) sices at base 191 and 
the Sph r site at buse 209 of the osp-pb d gene. Among 
Apa I, Ora I, and Ecs I, An^ I is preferred because II 
recognizes six bases uithout an/ ardaiguity. Dra IT and 
5 Pss I, on the other hand, recognize six bases with two- 
fold ambiguity at two of the bases. The vqDNA will 
contain .-nore Dra II ar.d I recognition sites at the 

varied locations than it vill contain Apa .1 recognition 
sites. The unwanted extraneous cutting of the vgDKA by 

10 Apa I and Spn I will elininatc a few sequences from our 
population. This is a ninor problem, but by using the 
more specific enzyme (Ana X), we nininize the unwanted 
effects. The sequence shown in .Table 37 illustrates an 
additional way in which gratuitous restriction sites 

15 can be avoided in sone cases. The osp - 1 pbd gene had 
the ccdon CCC for g51; because we are varying both 
residue 50 and 52, it is possible to obtain an Apa I 
site. If we change the glycine codon to GOT, the Apa I 
site can no longer arise. An-i -I recognizes the DNA 

20 sequence (GGCCC/C) . 




Each piece of dsDfJA. to be synthesized needs six to 
eight bases added at either end to allow cutting with 
restriction enzyr.es and is shown in Table 37. The 
25 first synthetic base (before cutting with Apa I and Sph 
I) is 184 and the last is 322. There are 142 bases to 
be synthesizf^d. The • center of the piece to the 
synthesized lies between QSA and V57. The overlap can 
not includ*2 varied bases, so we choose bases 2 4 5 to 25^ 
30 as the overlap that is 12 bases long. tioto that the 
codon for F56 has been changed to TTC to increase the 
GC content of the overlap. The anino acids that are 
being varied are narked as X with a plus over then. 
Codons 57 and 7 1 arc synthesincd on the sense (bottom) 
strand. The design calls for "qfh" in the antisense 
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strand, so that the sense strand contains (fron 5' to 
3') a) equal part C and A f i ■ e . the conplorent of k) , 
b) (0.40 T, 0.22 A, 0.22 C, and 0.16 C) f i .e. the 
conplement of C). and c) (0.26 0.2S A, 0.30 and 

0. 18 G) . • 



Each residue 
possible outcomes, 



that is encoded by "qf/i" has 21 
each of the anino acids plus stop. 
Table 12 gives the distribution of amino acids encoded 
10 by "qfk", assuaing 5\ errors. The abundance of the 
parental sequence is the product of the abundances of R 
xrxA>:LxV:<A. The abundance of the Least- 
favored sequence is 1 in 4.2 x 10^. 




15 01igS27 and oligii23 are annealed and extended with 

Klenou fragment and all four (nt)TPs. Dcth the ds 
synti.etic ON'A and ?.F pLG7 DN'A arc cut with both Ana I 
ai'.d Soh I- The cut d::a is purified and the appropriate 
pieces ligated (See Sec. 14.1) and used to transform 

20 conipetent PE333. (Sec. U.2). In order to generate a 
sufficient nunber of trans fonnants , '/^ is set to 50C0 

Dl. 
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1) culture £ . co I 1 in 5.0 1 of L3 broth at ZV^C 
until cei: 
cells/nl. 



until cell density reaches 5 x lo'' to 7 x lo"^ 



3 0 



2) chill on ice for 65 ninutcs, centrifuge the 
cell suspension at <000g for 5 ainutes at A^C, 

3) discard supernatant; resuspend the cells in 
1667 ml of an ice-ccLd, sterile solution of 60 
nM CaClj, 



35 



4) Chill on ic^ for 15 minutes, and then 
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centrifuge at AOQQq for 5 rainutes at A^C, 

5) discard supernatant; resuspend ceils in 2 x 
400 nil of ice-cold, sterile 60 mM CaCij.* store 
cells at 4°C for 24 hours, 

6) add DNA in ligation or TE buffer; nix and 
store on ice for 30 ninutes; 20 nl of solution 
containing 5 ug/.-nl of DMA is used, 

7) heat shock cells at 42°C for 90 seconds. 
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3) add 200 ml LB broth and incubate at 37°C for 
1 hour, 

9) add the culture to 2.0 1 of L8 broth 
containing anpicillin at 35-100 ug/r.l and 
culture for. 2 hours at 37^c. 

10) centrifuge at S''-00. g for 20 ninutes ^t 4°C, 

11) discard suporratant, resuspend colls in 50 
ml of LB broth -jIus anpicillin and incubate 1 
hour at 370c, 

12) plate cells cn L5. agar containing 
anoicillin, 



30 



13) harvest virions by r.ethod of Salivar et aU 
(SALI64). 



The heat shocK of step (7) can bo done by dividing the 
200 nl into 100 200 ul aliquots in 1.5 ml plastic 
35 Eppendorf tubes. It is possible to cptinire the heat 



2:0 

shocJc for other voLuncs and kinds cf container. It is 
important to: a) use ail or nearly all the vgD:JA 
synthesized in ligation, this will require large 
arao'ints of pUZ7 b.Tckbonc, b) uce all or nearly all the 
ligation mixture to transform cells, and c) culture all 
or nearly all the trans foraants at high density. These 
neasures arc directed at maintaining diversity. 



IPTC is added to the growth ncdium at 2.0 rJi (the 
LO optimal ieve]) and virions arc harvested in the usual 
way (Sec. 14.3). It is in-.portant to collect virions in 
a way that samples all or nearly all the trans forriants . 
Because r" cells arc used in the transformation, 
multiple infections do not pose a problem. 

15 

HHMb has a pi of 7.0 and we carry out 
chromatography at pH 3.0 so that HHMb is slightly 
negative while BPTI and most of its mutants are 
positive. HHMb is fixed (Gee. 15.1) to a 2.0 r::! colur.n 
20 on Affi-Cel 1q(TM) or Affi-Cel isf^'^l at 4.0 mg/mi 
support matrix, the same density that is optimal for a 
column supporting trp. 



We note that charge repulsion between B^Tl and 
25 HHMb should not be a serious problt»m and does not 
impose any constraints on icns or solutes allowed as 
eluants. Neither BPTT nor HHMb have special 
requirements that constrain choice of eluants. The 
eluant oc choice is KCl in varying concentrations. 
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To remove variants of BPTI with strong, 
indiscriminate binding for any protein or f^r the 
support matrix (Sec. 15.2), wo pass the variegaced 
population of virions over a column that supports 
bovine serum albumin (DSA) before loading the 
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popuUcicn onco the {HHKb) coiunn. Atfi-Cel io('^**^) or 
Affi-Cel ist"^''^! is used to innobiii^c 3SA ac the. 
highest level the matrix will support. A 10.0 ml 
colurr.n is loaded with 5.0 ml of Af f i-Ce I- 1 inked-BSA; 
this colunn, called (BSA), has - 5,0 ml. The 

variegated 'population of virions containing 10^^ pfu in 
1 nl (0.2 X Vy) of 10 r-M KCl, 1 phosphate, pH 8.C 

buffer is applied to (BSAJ. v;e wash iBSA) with 4 . 5 nl 
(0.9 X Vv) of 50 n!I KCl, 1 nuM phosphate, pH 0.0 buffer. 
The wash with 50 ituM salt will elute virions that adhere 
slightly to 3SA but not virions with strong binding. 
The pooled effluent of the {BSA) colunin is 5.5 ml of. 
approximately 13 ri^^ KCl. 

The colu.nn (HHKb) is first blocked bv treatment 
with 10^- virions of M13(an429) in 100 ul of 10 m.^ KCl 
buffered to pH 3.0 with phosphate; the colur.n is washed 
with the sa.-e buffer until 002^0 returns to base line 
or 2 X Vv have passed through the colunn, whichever 
cones first. The pooled effluent fron (3SA| is added 
to tHK.Mb) in 5.5 nl of 13 rrJl .KCl, 1 vl\ phosphate, pH 
3.0 buffer. The cclunn is eluted (Sec- 15,3) in the 
following way: 

1) 10 r.:\ KCl buffered to pH S . 0 with phosphate, 
until optical density at 2S0nn falls to base line 
or 2 X Vy, whichever is first, (effluent 
discarded) , 

2) a gradient: of 10 rJI to 2 M KCl in 3 x \\j , pH 
held at 3.0 with phosphate. (30 x 100 ul 
f ract ior.s) , 

3) a gradient cf 2 M to 5 M KCl in 3 x Vy, 
phosphate buffer to pH a.O (30 x ICO ul 
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fractions) , 



4) constant 5 M KCl plus 0 to 0.3 M quanidlnium Cl 
in 2 X Vy, vith phosphate buffer to pH 8.0, (20 x 
100 ul fractions) , and 



5) 

X V,, 



constant 5 K KCl plus 0,8 M guanidiniura Cl in 1 



with phosphate buffer to pH 



G . 0, 



■ ( 10 X 100 



ul fractions) . 



In addition to the elution fractions. a sample i. 
removed fron the colunn and used as an inoculun for 
phage-sensitive Sup" cel\s (Sec. 15.4). A sar.ple of 4 
ul trou each fraction is plated on phage-sensitive Sup- 
ceils. Fractions that yield too nany colonies to 
count are replated at lower dilution. An approxi.-nate 
Litre of each fraction is calculated. Starting vith 
the last fraction and vor>cing toward the first fraction 
that was titered, we pool fractions until approximately 
109 phage are in the pool, i^e_^ about 1 part in inOJ of 
the phage applied to the column. This population is 
infected into 3 x IC^^ phage-scns i t ive nE334 in 300 nl 
of L3 broth. The .ery low multiplicity of infection 
(coi) is chosen to reduce the possibility of nultiple 
infection. After thirty ninutos. viaole phage have 
entered recipient cells but havft not yet begun to 
produce new phage, Phage-born genes are expressed at 
this phase, and we can add anpicilUn that will IciU 
uninfected cells. These cells still carry F-pili and 
will absorb phage helping to prevent aultiple 
infections. 

If cultiple infection should pose a problem that 
cannot be solved by growth at low r.ult iple-of-inf ect ion 
on F- cells, the following procedure can be employed to 
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obviate the problcn. Virions obtained fron the 
affinity separation are infected into F"^ ^ co I i and 
cultured to anp.lify the genetic nessages (Sec. 15.5). 
CCC DNA is obtained either by harvesting RF DMA or by 
1 n vitro extension of prir-crs annealed to ss phage DNA. 
The CCC DNA is used to transform F" cells at a high 
ratio of colls to UtJA. Individual virions obtained in 
this way should bear only proteins encoded by the Or.'A 
within. 

The variegation produced as nany as 6.4 x 10^ 
different amino-acid sequences. ^ett 900. Thus, 

after two separation cycles, the probability of 
isolating a single SBD is less than 0.10; after three 
cycles, the probability rises above 0.10. 

The phagemid population is grown and 
chro.-natographed three ti-es and then exar.ined for 5BDs 
(Sec. 15.7). In each separation cycle, phage frcn the 
last three fractions that contain viable phage are 
pooled with phage obtained by removing so.-ne of the 
support matrix as an inoculum. At each cycle, about 
10^2 phage are loaded onto the colunn and about 10^ 
phage are cultured for the next separation cycle. 
After the third separation cycle, 32 colonies are 
picl:ed from the last fraction that contained viable 
phage; phage fron these colonies are denoted SBDl, 
SBD2, . . . , and SBD32. 

Each of the SBDs is cultured and tested for 
retention on a Pep-Tie colurnn support i.-.g HUMb (See, 
15.8). Phage LG7(SUD11) shews the greatest retention 
on the Pop-Tie {HUMbj colu.T.n, eluting at 3 67 KCl 
while wtM13 elutfis at 20 r-M KCl. S3D11 beco-es the 
parental anino-acid sequence to the second variegation 
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a Pcp-rLe (HHy.b; colu-n. Of this set, SHDl].-2j sho*-s 
the greatest retenticn on the Pop-Tie (HHMb) colurin.. 
elutinq dc 692 ruH KCl . 

The results of this hypothetical selection is 

shovn in Table 40. Residue 33 (^15 of BPTI) changed to 

E, 41 becones V, 43 goes to N, 4 4 goes to F, 51 goes co 

F, 54 goes to S, 55 goes to A, and 72 goes to Q. 

The gbd 1 1 3 portion of the oso-pbd gene is cloned 
into and :>xpression vector and BPTI(E15, D17, Via, Q19, 
N20, r21. E27, F28, L29, S31, A32, S34, W7 1 , Q72} is 
expressed in the periplasm. This protein is isolated 
bv standard methods and its binding to HiiMb is tesred. 



is found to be 4 . 5 :c 10 



-7 



M. 



A third round of variation, using SBDll-23 is 
PP3D, is i 1 i'.istrn ted in Tabic 4 1; eight anino acids are 
varied. Those in the principal set, rrisidues 40, 55, 
and 57, arc varied through all twenty anino acids. 
Residue 32 is varied through P, 0, T, K, A, or Z. 
Residue 34 .is varied through T, P, Q. K, A, or E. 
Residue 44 is varied through F, L, Y, C, W, or stop. 
Residue 50 is varied through E, K, or Q. Residue 52 is 
varied through L, F, I, M, or V. 

The result of this variation is shown in Table 42. 
The selected SBD is dtnotcd SBnil-23-5 and eLutes from 
a Pap-Tie (HHMb) column at 98 0 r\M KCl. The sbdll-r3-5 
segnent is cloned into an expression vector and 
BPTr{E?, Qll, E15, A 17, Via, Q19, ::2 0, '.V2 1, Q2 7, F23, 
M29, S31, L32. lO-l, W7 1 , Q72) is produced. This tir.e 
the K.J is 7.3 x 10"'-^ :t . 



This exarr.plc 



hypothetical. It is anticipated 




m 





4^. . - . 



22fj 



that more varieqaticn cycles -ilL be nceilcd to -jchiove 
dissociation constants of 10"^ M. It is also possible 
that Kore than thrco soparation cycles will be needed 
in some variegation cycles. P.eal DNA chemistry and D^fA 
synthesizers nay have larger errors than our 
hypothetical 5^ . If Sq^j- > 0.C5, then ve nay not be 
able to vary six residues at once. Variation' of 5 
residues at cnce is certainly possible. 



pi 



m 



m 



f. - - ■ 



•si 
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Tabic I: Single-letter codes. 



sLnqle-l , f^t!t^r corf^ : s u^ed for pr oteins : 



ALA 
= CLY 
= MKT 
= SER 
= STOP 



C = CVS 
h = MIS 



n - 
t = 



ASM 
THR 
any a; 



d ASP 
i = ILE 
p = PKO 
V = VAL 
ir.o acid 



e = GLU' 

k = LYS 

q = GUI 

w = TRP 



f = PHE 
1 = LEU 
r = ARC 
y = TYR 



= n or d 
= e or q 

= any amino acid 



Sinale-lf t^^or r'M c^dr-s for DVA . 

T, C, A, G st.Tnd for thcnselves 

M .for A or C 

R for puRines A or G 

w for A or T 

S for c or G 

Y cor pYririidincs T cr C 
K for G or T 

V for A, C, or G (not Tl 
H for A, C, or T (not C) 
D for A, C, or T (not C) 
B for C, G, or T (not A) 

H for any base. 




I. - 




r 








o. 
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Table 2: Prercrred Outer-Surface Proteins 

Preferred 
Jenetic Outer-Surface 



10 



15 



20 



25 





Protein 


_ Reason foi;- proference 


M13 


coat protein 


a) exposed amino terminus, 

b) predictable post- 
translat iona 1 
processincj, 

c) nu.Tierous copies in 
vi rion . 




Qp r I r 


a> fusion data rivaflable 


PhiX174 


G protein 


a) known to be on virion 
exterior, 

b) snail enough that 
the G-inbd gene can 
roolace H aone . 


E. coli 


LaraB 


a) fusion data available, 
bl non-essential. 


B, Gubtilis 
spores 


Cote 


a) no post-translational 
processing, 

b) distinctive sdequcnce 
that causes protein to 
localize in spore coat, 

c) non-osr.ent i a 1 . 




CotD 


Sano ns for CotC. 



m 



^: < 



m 

t :■■ ■ 



Q 



0 




10 



15 



20 



25 



35 



40 



45 
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Table 3: Ambiguous DMA for AA_SGq2 



A. 



I 

T.C 



2 

A. A. r 



k 
3 

A.A. r 



a 

.C.n 



17 
.T.n 



P 

25 
.C.n 



y 

33 
.A.y 



4 I 
.T.h 



K 

.A. r 



57 



10 11 
T.C. n! G.T. n 
A.C.yl 



P 
18 
C.C.n 



d 
26 
G.A.y 



m 
19 
A.T.G 



f 

27 
T.T. y 



A 

T.C. n 
A.C.y 



12 
C.C.n 



1 

2C 
T. r 
T.n 

c 

23 

G.y 



G. 



T. 



t g 

34 I 35 
A . C . P. I G . C - n 



i 

. 42 
A. T.h 



r 
43 
.G.n 



a g 
50 51 
G.C.ntG.G.n 



P 
36 
.C.n 



y 

44 
T.A.y 



1 

52 
T.T. r 
C.T.n 



1 

5 

T.T. r 
C.T.n 



13 
G.T. n 



s 

21 
T.C.n 
A. G.y 

1 

29 
T. T. r 
C.T.n 



c 
37 

T.G.yi A 



f 

45 
T.T.y 



c 

53 
T.G.y 



a 

14 

C.n 



f 

22 
. T . y 



e 

30 
. A. r 



38 

, A. r 



y 

46 
.A.y 



1 
7 

T.T. r 
C.T.n 



t 

15 
A.C. n 



23 
G.C.n 



P 

31 
C.C.n 



k 
S 

A. A. r 



1 

16 
T.T. r 
C.T.n 



r I 

24 
C.C.n 
A.G. r 

? I 
■J 2 I 

C.C.nl 



a r 
39 I 40 I 
C.C.njC.G.nl 
- lA.C.ri 



47 
A . A . y 



q t 
54 I 55 
, A. ri A.C-n 



53 ! 59 ! 



! g 



eo 



|C.T.n|T.A.y]C.G.n|G.G.n 



c 
61 
T.G.y 



r 

62 
CG.n 
A.G. r 



a 

63 
G.C, n 



a [ 
43 I 
G . C . n I 



56 
T.T.y j 



I ! 

A. A. r| 






2-;; 



Tabic 3, continued. 




10 



15 



r 

65 
C.G. n 
A.G. r 



d 
73 
C . A . y 



81 
G.C.n 



66 
A.A. y 



c 

74 

T-.G.y 



a 

82 
G.C.n 



67 
A . A . y 



m 

75 
A.T.G 



e 

83 
G. A. r 



f 

68 
T.T. y 



r 

76 
C.G . n 



84 
G.C.n 



y. 

69 
A.A.r 



70 
T.C.n 
A.G.y 



a c 
71 72 
C. C. n|c . A. c 



t C I q I a ! 

77 78 !• 79 EO 

A . C . n I T . G . y t C . G . n I G , G . n I 



d 

85 
G.A.y 



d 

86 
G.A.y 



p a 
37 88 I 
CCnjcCnl 




20 



25 



30 



35 



40 



45 



50 



k 
89 
A.A.r 



a 

97 
G.C.n 



90 
G.C.n 



s ■ 
. 98 
T.C.n 
A.G.y 



a I f I N 
91 I 92 I • 93 
C.C.n|T,T.y| A.A.y 



s 

94 

T.C.n 
lA.G.y 



1 

95 
T.T. r 
C.T.n 



96 



y 

105 
T. A. y 



1 

13 3 
A.T.h 



106 
G.C.n 



114 
C.T.n 



a 

99 
G.C.n 



107 
T.G.C 



t 
100 
A. C. n 



a 
108 
G.C, n 



e 
101 
G. A. r 



102 



I 1.03 104 I 
. A. yj A.T.h I G.C.n i 



rn I V 
109 I 110 I 111 
A.T.GlG.T.niG.T.n 



112 
C.T.n 



<3|<3|t|ilg|i! 
115 lib 117 118 119 12C 
G,G.n|G.C.n|A.C.n]A,T.hiG.G.nlA.T.hi 



k 


1^ 


f 


1 k 


k 


f 


C 


121 


122 


123 


1 124 


125 


126 


127 


A.A.r 


T.T. r 


T.T. y A.A. r 


A.A.r 


T.T.y 


A.C. n 




C.T.n 












a 


s 


. 








129 


130 


13 1 


132 


133 


134 




A.A.r 


G.C.n 


T.C.n 


T.A. r 


T.A. r 


T.A. r 








A.G.y 


T.G. A 


T . G . A 


T.C.A 





s 
123 
T.C.n 
A.G.y 




Table 4: Table of Restriction Enzyraes 



Table of restriction enzymes with lUB codes. 
Suppliers : 

S=Siq:na Chenical Co. 
10 P.O.Box 14 508 

Gt. Louis, Ho. 5 3 17 8 

B=BethGsda Research Laboratories 
P.O.Box 6009 

15 Caithersbur^g, .Maryland, 20877 

M=Boehringer Mannheim Biochenicals 
79; I Castleway Drive 
Indianapolis, Indiana, 46250 

I = Ineerna tional Biochenicals, Inc. 
P.O. Bex 9558 

N'cw Haven, Connect: icutt , 06535 

25 N=N'ew England BioLabs 

32 Tozer.r^oad 

Beverly, Massachusects , 01915 
P=^Pror.cqa 

30 2300 S. Fish Katchery Road 

:'^.1dison; Wisconsin, 5371 1 

T=Strat3gane Clcning Syster.s 
11099 North Torrey Pines Road 
35 La Jolla, California, 92037 



20 



' + before enzyr.c nar.e means that overhang can not 
40 sei f-cor7,plementary . 

\ before enzyme name neons th.it overhang nay or 
not be se 1 E-conpl er.cntary . 
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Table 4, continued. 





Enzyme 


Reco»5nit. Symn 


cuts 


Supply 




Aat II 


GACGTC 


. P 


5, 


I 


<S , M , I , N , T 


5 


tAcc I 


GTHKAC 


P 


2 , 


4 


^ a U T iLf O 

<b,4t,l,N,P,T 




Acc III 


TCCCGA 


P 


1 , 


5 


<T 




Acv I 


GRCCYC 


P 


2, 


4 


<A^'. a 1 1 c N 




Afl II' 


CTTAAC 


P 


1, 


5 






tAfl IIIACRYCT 


P 


1, 


5 


<ncr.e 


10 


Ah 3 III 


TTTAAA 


P 


3 . 


3 


< 3 , I tM u ra L , n , i , M 




+AlvN I 


CAGNNNCTG 


P 


<i» 


3 


<- 1» 




Aoa I 


GGGCCC 


P 


5 , 


1 


< M , I , N , P , T 




ApaL I 


GTGCAC 


P 




5 


< t* , T 


15 


Ase I 


ATTAAT 


P 


2 


4 


< ^^ 




Aso7;8 


GGTACC 


P 


1 , 




< no ne 




Asu II 


TTCCAA 


P 


2 , 


4 


^ r , ^ V C 5 ^ O 1 ; 




tAva I 


CYCCRG 


P 


i , 


5 


j'C'> OM T U OT* 

<i>. ,o,n, l,[^,r, 1 , 












Apu I : T 


20 


Ava III 


ATGCAT 


P 


5 , 


1 


< T ; *; s 1 I ; M , ^^ , P , T 












EcoT2 2 r : T 




Avr II 


CCTAGG 


P 


1, 


5 


< *I 




Dal I • 


TGGCCA 


P 


3 , 


3 


< S , 3 , I . t< , T 




BanH I 


GCATCC 


P 


1 , 


5 




25 


%Ban X 


GGYRCC 


P 


1 , 


5 






Bbe I 


GGCCCC 


P 


5 , 


1 






IBbv I 


GCAGC 


nP 


13 , 


17 


< I N T 




tDbv II 


CAAGAC 


nP 


3 , 


12 






Be 1 I 


TGATCA 


P 


1 


3 




3 0 


+ Bgl I 


GCCKNNNNGCC 


■P 


7 [ 




<S, 3, I , N , ?,T 






ACATCT 


P 


1 , 


5 


<S , 3, : , ?,T 




+ B_Lri I 


GGATC 


nP 




10 


<A -vr : N 




+ bs:n 1 


GAATCCN 


r.P 


7 \ 


5 


<::,T 




RspH I 


TCATCA 


P 


1, 


5 


<i; 


35 


\BsoH 1- 


ACCTCC 


nP 




14 


<N 




BssH II 


GCGCGC 


P 


1 , 


5 


<:: , T 




+ BstE; IIGCTNACC 


P 


1 , 


G 


<5 , 3, M, N, T 




tBstX I 


CCANNNNNDTCG P 


8, 




<:'', ?. T 




Ctr I 


YGOCCR 


? 


1 , 


5 


<Z?.Q I:N,T 


40 


Cla I 


ATCCAT 


P 


2, 




<s , 3, >i,u,T: 












lU : I 




+Dra II 


RGGN'CCY 


P 


2 , 


5 


<:!.T : Ecocn09 I: 




+Ora IIICACSNNGTC 


P 


6 , 


3 


T 




EC04 7 IIAGCGCT 


P 


3 , 


3 


<nor.o 


45 


+EcoN I 


ccT:;t:r;NNACG 


P 


5 » 


6 


( soon ) 




EcoR I 


GAATTC 


P 


1 , 


5 


<S, 3. M. i . P,T 




kcoR y 


GATATC 


P 


3 , 


3 


< S . D . M , I . ^J , P . T 




^■Esp I 


GCTNACC 


P 


2 , 


5 


<T 




tFok I 


GGATG 


nP 


1-i , 


18 


<M .r.'.T 


50 


cn 11 


YCCCCG 


nP 


I, 


5 


< 




Hae I 


KCGCCW 


P 


3 , 


3 


< 




Kae II 


RGCGCY 


P 


5, 


r 


<S , B.M. I, N,T 



m 




Table 
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20 



25 



30 



35 



40 



45 



50 



+ Hqa I 



* Hph I 

+ ilbo II 
Klu I 
I 
I 

I 

Nco I 
Nd^ I 
Nhe I 
Sot: I 
Mru 



N^P75?^ 
N503 II 
^PMM I 
■^Ple I 
9r.^C I 
^a^ruM I 

+ £5.5 I 
PSZ I 

Pvu I 

Pvu II 
+ R5r II 
Sac I 

Sac II 
SfLl I 
+ Sau I 

Sea I 

+ Sfl I 
Sn^ I 



Sail 

SSB 

t Sty 
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continued , 



GACCC 




10 


15 




CWGCWC 








<N 


GGYRCC 








< 


GRGCYC 






]^ 


-^Ban II : S , M , I , S ,T 


CTVRAC 








fiHinc ir*S B I 

MOT 


AACCTT 






, 5 


<i;,B,M, I,N,P,T 


GTTAAC 






, 3 


<S,B,M,I,N,P >T 


GGTCA 


n P 


1 .< 


I 2 


<N T 


CCTACC 






/ 1 


n M T N P T : 
A5d718:H 


CAAGA 


nP 


13 


« 12 


C O T H 


ACGCGT 






5 


< M , N , P , T 


TGCGCA 






3 


<T • Fso I • S N 


CCCGGC 






, 3 


<M,N,T 


CGCGCC 








<B, N ,T 








5 


< B , M , , P , T 


CATATG 








< B N , T 


GCTACC 






^ 5 


< M , N , ? , T 


GCGCCCGC 








<M, N , P, T 


TCGCGA 






^ 3 


<B,M, N,T 


RCATGY 










CMC C KG 






3 


< 


CCAJUJNNNTCG 








<N* 


CkCTCllUVUU 


nP 




10 


<N' 


CACGTG 






, 3 


<none 


RCGWCCY 






, 5 


<\\ 


RGCriCCY 






, 2 


<1 


CTCCAG 






, 1 


<S, B,M, I,N, P.T 


CGATCG 






, 2 


<S, B, N, B(Xor II) ,M 
, P,T 


CACCTC 






, 3 


<S,B,M,I,N,P.T 


CCGWCCG 






, 5 




GAGCTC 






/ 1 


<D(^ I) ,M,I,N.P, 
T 


CCGCGG 






, 2 


<B(S5t rrj .i,N.p,T 


CTCGAC 






, 5 


<B,M, I,U, P,T 


cct:iagg 






, 5 


<M; Cvn I:B; Mst I 
:T; nsu36 r.:N; 


ACTACT 






, 3 


<M,i:, P,T 


CCATC 


nP 


10 


, 14 


<N 


CCCCNNtUINCCCC 


P S 


, 5 




CCCGGG 






, 3 


<B,M, I, P,T 


TACGTA 






, 3 


<M,N,T 


ACTAGT 






, 5 


<M,N,T 


GCATGC 






, 1 


<B,H, I,N, P,T 


AATATT 






, 3 


<M,U,T 


AGGCCT 






. 1 


-•n,rM(Aiit I) ,?,T 


CCVTm'GG 






, 5 





V 



2-; 4 



Tabic 4, continued. 



10 



15 



I Tag II CACCCA 
% TAa II 'CACCCA 
+ Tthlll I CACNN'IICTC 
t Tthlll ll CAARCA 



Xba 



Xho II 
Xr.3 I 
Xna 



III 



TCTACA 
GTATAC 
CTCCAC 

RCATCY 
CCCGCC 
CCGCCG 



nP 17, 15 
nP 17, 15 
P 5 
16,14 



nP • 
P 
P 
P 

P 
P 
P 



1» 
3, 
1, 

1, 
1 , 
1 , 



Xr.n I GAAriNNKTTC 



5, 5 



<none 
<none 
<I ,N,T 
<none 

<B,M, I,K,P,T 
<H(soon) 

<B,M,I,P,T; Cci I: 

T ; PaeR7 I : r; 
<M.T ;Nf BstY 1} 
<I,N, P,T 

<B; Ej-s 

Eco52 I:T 
<tl.M f Asg7Q0 ) ,T 



N restrct = 



100 



20 



25 



Notes: 
Syran: 



m 



P for pa 1 indror.ic , nP for non-pa 1 indronic 

first nunber indicates position of cut in 
top strand, 1 r.cans after first base of 
recognition; second nunber indicates 
position of cut in lower strand, counting 
left-to-right , 




f 

I ■ 



i 



1i 

v-r 



J 



7v 



0 



0 
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Table 5: Potential sites in ipbd gene. 



Summary of cuts. 



10 



15 



20 



25 



Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 



% Acc I has 
ACl 11 has 
A pa r has 2 
A?u II hJG 
Ava III h*:s 
SspM 11 hai; 
■ BsrH II has 
t BstX I has 
f pra II has 



+ECON I has 
+ f:pp I r-.as 
Hind III ha 



3 elective sites 
I elective sites 
elective sites : 
1 elective sites 
L elective sites 
L elective sites 

2 elective sites 

1 elective sites 

3 elective sites 

2 elective sites 
! elective sites 
; 6 elective site 



35 



40 



45 



Enz 

Enz 
Enz 
Enz 
' Enz 



elective sites : 
elective sites : 
elective sites : 
elective sites : 
elective sites : 
elective sites : 

1 elective sites 

1 elective sites 

2 elective sites 
1 elective sites 

2 elective sites 
elective sites : 
elective sites : 
elective sites : 



6 elective sites 
14 3 

Xba I has 1 elective sites : 
Xca I has 2 elective sites : 
Xho I has 1 elective sites : 
Xna III has 3 elective sites 




t stv I has 



: 96 169 281 
: 19 

102- 103 
: 381 

: 314 • 

: 72 

: 67 115 
: 323 

: 102 103 226 

: 62 94 
: 57 1S7 
s : 9 23 60 
287 361 386 

48 

314 

238 343 
323 

25 289 388 

38 65 

: 94 
: -223 

: 102 226 

: 1C2 
: 24 261 

12 45 373 

221 

23 70 150 
237 356 
: 11 4 4 
6 3 32 3 38 3 
84 

96 169 
85 

: 70 2C9 
242 



Enzymes not cutting i rb.i . 




BanH I 
EcoR V 
Sal I 



Bel I 
Hpa I 
Sau I 



F^stE II 
Mot I 
Sr.a I 



III 

^ • ■ . 

i 



■^4 



'.i 

m 



Table 6: Exposure of amino acid types in T4 Izm & HZ'riL. 



HEADER HYDROIASE (0-CLYCOSYL) 18-AUG-86 2L2H 
COMPND LYS02YME ( E . C . 3 . 2 . 1 . 17 ) 
AUTHOR L.H. WEAVER, B.W.MATTHEWS 

Coordinates from Brookhavcn Protein Dat:i Bank: ILiM. 
Only Molecule A was con.siccred. 

HEADER HYDROLASE (0-CLYCGSYL) 29-JUL-82 ILYM 

COMPKD LYSOZYME (E.C,3.2.1.17) 

AUTHOR J.HOCLE,S.T.RAO,M.SUSC.\RALINGAM . 

Solvent radius = 1.40 Atonic radii in Table 7. 

Surface area measured in Angstroms^. 



m 

p 

m 



Type 



li <area> 



Sigma 



Max 



exposed ( fraction! 



ALA 


27 


211 


0 


1, 


47 


214 


3 


207 


1 


85 


U 


0 


-0) 


CYS 


10 


239 


8 


3. 


56 


245 


5 


234 


4 


38 


3( 


0 


1^) 


ASP 


17 


271 


1 


5. 


36 


2£1 


4 


262 


5 


127 


iC 


0. 




GLU 


10 


297 


2 


5, 


78 


304 


9 


285 


4 


100 


7C 


0, 


34) 


PHE 


3 


316 


6 


5. 


92 


325 




307 


5 


99 


8( 


0, 


32) 


CLY 


23 


185 


5 


1. 


3 1 


133 


3 


183 


3 


91 


3( 


0. 


50) 


WIS 


2 


297 


7 


3 . 


23 


301 


C 


294 


5 


3 2 


9( 


0. 


11) 


ILE 


16 


273 


1 


3 . 


61 


285 


6 


269 


6 


57 


5( 


0. 


21) 


LVS 


:3 


3 09 


-» 


5 . 


38 


321 


9 


jCO 


1 


147 


K 


0. 


-S) 


LEU 


24 


232 


6 


6. 


75 


304 


0 


269 


3 


109 


0( 


0. 


39) 


MET 


7 


293 


0 


5. 


70 


299 


5 


283 


1 


88 


2( 


0. 


30) 


ASM 


26 


273 


0 


5. 


75 


235 


1 


262 


6 


143 


^{ 


0. 


53) 


PRO 


5 


239 


9 


-» 


75 


242 


1 


234 


6 


123 


"( 


0, 




GLM 


3 


299 


5 


4 , 


75 


305 


8 


291 


5 


145 


9( 


0. 




ARC 


24 


344 


7 


8. 


66 


355 


& 


326 


7 


240 


7C 




70) 


SER 


16 


223 


6 


3 . 


59 


236 


6 


223 


3 


93 


2( 


0 


-;3) 


THR 


13 


250 


3 


3 . 


89 


257 


2 


244 


2 


139. 9( 


0 


56) 


VAL 


15 


254 


3 


4 . 


05 


261 


8 


245 


7 


111 


K 


0 


4 4 * 


TRP 


9 


359 


4 


3. 


3fl 


3C6 


4 


355 


1 


102 


0( 


0 


23) 


TYR 


9 


335 


8 


4 . 


97 


3<2 


0 


325 


0 


72 


6( 


0 


22) 



m 



-rr:^rrn^'.W^^^ ^ ^-s" l r '^r^-^^^A:lT^-^-.T1^' 
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Table 7: Atomic radii 
Angstrons 



-alpha 



1.70 



Ocarbonyl 1-52 
Other atorr.s 1.80 



10 



15 



•20 



Table 8 

Fraction of OUA molecules having 
n non-parencal bases when 
reagents that have fraction 
M of parental nt. 



M 


.9965 


.97716 


.97612 


.3577 


.79433 


.63096 


fO 


.9000 


. 5000 


, 1000 


.0100 


. 0010 


. 000001 


f 1 


. 09499 


.35061 


, 2393 


.04977 


.00777 


. 0000175 


f2 


.00485 


. 1183 


.2763 


.1197 


. 0292 


.000149 


t2 


.00016 


. 0259 


. 2061 


. 1354 


. 0705 


. OO0S12 


f4 . 


0C0004 


.00409 


..1110 


, 2077 


. 1232 


. 003207 


f8 


0. 


2X10""^ 


.00096 


.0336 


.1182 


.oaoi55 


f 16 


0. 


0. 


0. 


5x10"'' 


.00006 


.027231 


t22 


0. 


0. 


0. 


0. 


0. 


. 00OO0S9 


noGt 


0 


0 




5 


7 


12 



25 '"most" is the value of n having the highest 
probability.. 



10 



15 



20 



25 



243 



Table 9: best vgCodon 

Program "Find Optimum vgCodon," 
IMITIALXZE-MEMORY-OF-ABaNDANCtS 
DO ( tl = 0.21 to 0.31 in steps of 0-01 ) 
. DO ( cl = 0.13 to 0.23 in steps of 0.01 ) 
. . DO ( al. = 0.23 to 0.33 in steps of .0.01 ) 
Comment calculate gl fron other concentrations 

. .. . gl = 1.0 - tl - cl - al 
. . . .IF( gl .ge. 0. 15 ) 

. . . . DO ( a2 0.37 to 0.50 in steps of 6.01 ) 
DO ( c2 « 0.i2 to 0.20 in steps of 0.01 ) 

Comment Force D+E » R + K 

g2 = (gl*a2 - . 5 *a 1 *a2 ) / (cl+0 . 5 ♦a 1 ) 

Comment Calc from other concentrations. 

t- 1. - a2 - c2 - g2 

...... IF(g2.gt. 0.1. and. t2.gt.0.1) 

CALCULATE -ABUNDANCES 

........ COHPARE-ABUIiDAJ.'CES-TO-PREVIOUS-OrfES 

end__IF_bloc)c 

end_DO_loop I c2 

. .* . . . . end_DO_loop ! a2 

end_If_block ! if gl big enough 

. . , . end_DO_loop ! al 
.... end_DO__loop • cl 
. . end_DO_loop ; tl 

WRITE the best distribution and the abundances. 



\4, 



m 

13 




T 

— Bj^. — ;-v 




li ft ■ u fc 'IT -t', gra a ,1 J- ■« 




0 



2S0 



Table 11: Calculate vorst codon. 

Program "Find worst vgCodon within Serr of given 
5 distribution," 

INITIALIZE-MEMORY-OF-ADUWDANCES 
Coranent Serr is t error level. 
READ Serr 

Comment Tl i , CX i , Al i , C 1 i , T2 i , C2 i , A2 i , G2 i , T3i,C3i 
10 Coi?.:nent arc the intended nt-d istr ibution, 

READ Tli, Cli, Ali, Cli 

READ T2i, C2i, A2 i , C2 i 

READ T3i, 03 i 

Fdwn = l.-Serr 
15 Fup - l.+Serr 

DO C tl = Tli*Fdvn to Tli*Fup in 7 steps) 

, DO ( cl = Cli*Fdwn to Cli*Fup in 7 steps) 

. . DO ( £1 = Ali*Fdwn to Ali*Fup in 7 steps) 

. . ! gl = 1. - tl - cl - al 
20 ... IF( (gl-Cli)/Cli .It. -Serr) 

Comment gl too far belov Cli, puch it back 

. . . . gl = Gli*Fdwn 

.... factor = (l.-gl)/(tl + cl +• al) 

, . . . tl = tl*factor 
25 , . . . cl = cl*factor 

. . . . al = al*factor 

end IF_bloc>: 

. . . IFC Cgl-Gli)/Cli -gt. Serr) 
Comment gl too far above Cli, push it b&ck 
30 . . . . gl Gli*Fup 

.... factor = (l.-gl)/(tl + cl + al) 

. . . . tl = tl*factor 
• . , . . cl = cl * factor 

. . ... al = al* factor 
3 5 . . : . ,end_IF_blocIc 

. . . DO ( a2 = A2L*Fdvn to A2i*Fup in 7 steps) 

. . . . DO ( c2 = C2i*Fdvn to C2i*Fup in 7 steps) 

DO (q2=G2 i *Fdwn to G2i*Fup in 7 steps) 

-Comment Calc t2 fron other concentrations. 
40 t2 = 1. - a2 - c2 - g2 

rF( (t2-T2i)/T2i .It. -Serr) 

Comment t2 too far below T2 i , push it bacic 

t2 = T2i*Fdwn 

....... factor « (l.-t2)/(a2 + c2 + g2) 

45 a2 = a2*factor 

. . . . . . . c2 = c2*factor 

g2 = g2 * factor 

end_IF_block 

IF{ (t2-T2i)/T2i .gt. Serr) 

Comment t2 too far above T2 i , push it back 
t2 « T2i*Fup 

factor = (l.-t2)/(a2 + c2 +• g?) 



i 
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Tabic 11, continued- 

a2 =^ a2 * factor 

c2 = c2*factor 
g2= q2*factor 
! ! . . ... .end IF_bloc)c 

IF(g27gt. 0.0 .and. t2,gt.0.0) 
t3 « 0.5M1.-Scrr) 
g3 « 1. - t3 
. CALCULATE-ABUND^NCES 

COMPAnE-ADU::DANC£S-TO-PREVI0US-ONES 

* ! ! . . . . t3 = 0 . 5 

. ■ . . . . g3 « 1. - t3 

. . CALCULATE-ABUrtOAN'CES 

COMPARE-A3L'NDAt:CES-TO-PREVIOUS-OME3 
t3 = 0.5*(1.+Serr) 

* ! ! . . . . g3 - 1. - t3 

. . CALCULATE-ABUNDAHCES 

COMPARE-ABUHDANCES-TO-PREVIOUS-OKES 

........ end_IF_block 

end_DO_loop ! g2 

. . , , . . erd_DO_loop ! c2 

end_DO_loop I a2 

. : . . end_00_loop 1 al 
. . . end_DO__loop ! cl 
. . end DO_loop ! tl 

WRITE the WORST distribucion and the abundances. 



m 

B 



m 

m 
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Tabic 13: BPTI Homoloques 



-3 
-2 
-1 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
X 1 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 



R 
P 
D 
F 
C 
L 
E 
P 
P 
Y 
T 
G 
P 
C 
K 
A 

I 
I 
R 
Y 
F 
Y 
N 
A 
K 
A 
C 
L 
C 
Q 
T 
F 
V 
Y 
G 
C 

c 

R 
A 
K 
R 
N 



4 
F 
Q 
T 
P 
P 
D 
L 
C 
Q 
L 
P 
Q 
A 
R 
C 
P 
C 
K 
A 
A 
L 
L 
R 
Y 
F 
Y 
N 
S 
T 
S 
H 
A 
C 
E 
. P 
F 
T 
Y 
C 
G 
C 
Q 
C 
U 

u 

N 



9 10 11 12 13 14 15 16 17 18 19 
--------Z-- 



R 
P 
D 
F 
C 
L 
E 
P 
P 
Y 
T 
G 
P 
C 
I 
A 
R 
I 
I 
R 
Y 
F 
Y 
U 
A 
K 
A 
G 
L 
C 
Q 
T 
F 
V 
Y 
G 
•G 
C 
R 
A 
K 
R 
. N 



Q 
? 
L 
R 
K 
L 
C 
I 
L 
H 
R 
N- 
P 
C 
R 
C 
Y 

Q . 



2 
G 
R 
P 
S 
F 
C 
N 
L 
P 
A 
E 
T 
G 
P 
C 
K 
A 
S 
I 
R 
Q 
Y 
Y 
Y 
:l 
S 
K 
S 
G 
C 
C 
Q 
Q 
F 

r 

Y 
G 
G 
C 

R . 

G 

N 

Q 

N 





R 3 20 21 23 24 25 26 27 23 29 30 3 1 32 33 
-5 - -- -- -- __--_-D 

-A - 

-3 - -- -- -- -_---TP 

-2 Z-LZRK---RR-ET 
-1 P-QDDN---QK-RT 

1 RRH HRRIKTRRRCD 

2 RPRPPPNEVHHPFL 

3 KYTKKTCDARPDL? 
LAFFFFOSADDFDI 

5 CCCCCCCCCCCCCC 

6 lEKYYNEQtJDDLTE 

7 LLLLLLLLLKKESQ 
3 HIPPPLPGPPPPP A 
9 RVA AAPKYVPPPPFG 

10 NAE DDEVSIDD YVD 

11 PA P P PTVA RKTTTA 

12 GGCGGCGCCGKGGC 

13 RPPRRRPPPS'IPPL 

14 CCCCCCCCCCCCCC 

15 YHKKLNRMR--KRF 

16 DFAAAAAGAGQAAG 

17 KFSHY LRMFPTKGY 

18 rillMIFTIVVMFM 

19 PSPPPPPSQRRIKK 

20 AAARRARRLAARRL 

21 FFFFFFYYWFFYYY 

22 Y Y Y Y Y Y Y'F A Y Y F N S 

23 YVYYYYYYFYYYVY 
2 4 N S N D N N N N D D K M K N 
2 5 QKWSPSSCA7PATQ 
2 6 K G AAA H S T V R S K R E 

27 KAASGLSSKLAATT 

28 KKKNNHKHGKKGKK 

29 QKKKKKRAKTRFQN 

30 CCCCCCCCCCCCCC 

31 E, YQNEQEEVKVEEE 

32 RPLKKKKTLAQTPE 

33 FFFFFFFF FFFFFF 

34 OTHIINIQPQRVKI 

35 WYYYYYYYYYYYYY 

36 S3GGGCGCCSGGGC 

37 G GGGGGGCGGGGGC 

38 CCCCCCCCCCCCCC 

39 GRKPRGGM QDDKKQ 

40 GGCGGGGGCCGAGG 
4 1 N N N N N » N N fJ D D K N N 
42 SAAAAAAGGHHSGD 
4 3 N N N » N N N H M G G U H N 



Tabic 13, continued. 



n i 


20 


21 


22 


23* 


24 


25 


26 


27 


28 


29 


30 


31 


32 


33 




R 


R 


R 


N 


* N 


N 


N 


N 


K 


H 


N 


N 


H 


R 


45 


F 


F 


F 


F 


f' 


F 


F 


F 


F 


F 


F 


F 


Y 


F 


46 


K 


K 


S 


K 


K 


K 


H 


V 


Y 


K 


K 


R 


K 


S 


47 


T 


T 


T 


•T 


T 


T 


T 


T 


£J 


T 


S 


S 


S 


T 


48 


I 


I 


I 


W 


W 


I 


L 


E 


E 


E 


D 


A 


E 


L 


49 


E 


E 


E 


D 


D 


D 


E 


K 


K 


T 


H 


E 


Q 


A 


50 


E 


E 


K 


E 


E 


E 


E 


E 


E 


L 


r 


D 


D 


c 


51 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


c 


C 


C 


C 


52 


R 


R 


R 


R 


R 


Q 


E. 


L 


R 


R 


R 


M 


L 


c 


53 


R 


R 


H 


Q 


H 


R 


K 


Q 


E 


C 


C 


R 


D 


Q 


54 


T 


T 


A 


T 


T 


T 


V 


T 


Y 


E 


E 


T 


A 


K 


55 


C 


C 


C 


C 


C 


C 


c 


C 


C 


C 


. C 


C 


C 


C 


56 


I 


V 


V 


c 


V 


A 


c 


R 


G 


L 


E 


G 


S 


r 


57 


G 


V 


c 


A 


A 


A 


V 




V 


V 


L 


G 


c 




58 








S 


S 


K 


R 




? 


Y 


Y 


A 


f 




59 








A 


c 




S 




G 


P 


R 








60 










I 


G 






D 













20 Dendroaso is anaiisticcps (Eastern Green Manvba) 
C13 S2 C3 toxin {DUFT85) 

21 npnH rnPi9:ni^ DcLvlop is DOlvlcDes (Black manba) B toxin 
(DUFTS5) 

22 npnrir n.-^^^ois Dolvleo is colvlcpes (Black Mamba) E toxin 
(DUFTSS) 

2 3 Vicarrt ar.modvcos TI toxin (DUFT8 5) 

24 Vinera jrnorivtcs CTI toxin (DUFTSS) 

25 nuncarus fascijtus VTII B toxin (DUFT35) 

26 Aner.onia sulc;sta (sea aner.ono) 5 II (LsUFTSS) 

27 Ho:no s;iplens HI-1-;- "inactive" domain (DUFT85) 

28 Hor.o sapiens "active" domain (DUFT85) 

29 beta bunqarotoxin Bl (DUFTSS) 

30 beta bungarotoxin D2 {DUFT35) 

31 Bovine spleen TI II (F10RS5) 

32 TAchvpIeus triricntntns (Horseshoe crab) hemocyte 
inhibitor (NAK.\3 7) 

33 Bor.bvx nori (silkvorn) SCI-III (SASAS-1) 

Notes : 

a) both beta bungarotox ins have residue 15 deleted. 

b) B. r.ori has an extra residue between 05 and C14;' u-e 
have assigned F and G to residue 9. 

c) all natural proteins have C at 5. 14, 30, 3S, 50, i 55. 

d) all ho.-nologues have F33 and G37. 

e) extra C's in bunqaroCox ins fora interchain cystine 
bridges 



0 



O 
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Toble 14: Tally of lonizablo Groups. 
BPTI hoQologues. 



Sequence 
Identi f ier 

1 

2 

3 

4 

5 

6 

7 

8 

9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
23 
29 
30 
31 
32 
33 - 



E 
2 
2 
2 
4 
4 
2 
2 
2 
2 
2 
3 
3 
2 
3 
4 
5 
4 
1 
2 
3 
3 
2 
1 
2 
2 
5 

4 

3 
2 
2 
3 
3 
7 



NH 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 
1 
1 
1 
1 
1 
1 
1 
1 
1 
1 

1 ' 

1 

1 

1 

1 

1 



CO 2 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 



6 
6 
6 

-1 
2 
5 
5 
5 
5 
5 
5 

11 

10 
2 
4 
3 
7 
4 

11 
8 
5 
7 
3 
5 
5 
2 

-1 
2 
4 

s 

4 
4 
-7 



I 

16 
16 
16 
13 
16 
15 
15 
15 
15 
15 
19 
19 

la 

14 
16 
19 
2 1 
8 
17 
20 
13 
13 
15 
17 
13 
16 
11 
14 
2 2 
23 
16 
IS 
17 




Sequences given in Table 10 

+ is sum of K -t- R + NH - D 
molecule at pH 7.0 

if is sum of K R -t- r:H f D 
groups at pH 7.0. 



- E 



C02 , approxinate charge on 



+ E + C02 , i.e. nu.T±)er of icni::ed 
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Table 15: Aoino acids cbccrvcd ac each Residue 
BPTI hoaoloques 



—J 



Res. 
-5 
-< 
-3 
-2 
-1 

1 

2 



5 
6 
7 
8 
9 
10 
1 1 
12 
13 
14 
15 
16 
17 
13 
19 
20 
21 
22 
23 
24 
25 
26 
27 
23 
29 
30 
31 
32 
33 
34 
35 
36 
37 
33 
39 



Nunber 
Oi f f ernet 
AAs 

2 

2 

5 
10 
10 
10 

9 
10 

7 

1 
10 

5 

7 

9 
10 
10 

2 

5 

3 
12 

7 
12 

6 

7 

5 

4 

6 
.2 

10 

9 

5 

7 
10 

1 

7 
11 

1 
11 



Contents 
D -32 
E -32 

T P F Z -29 
Z3 R3 Q2 T2 H 
D4 T2 P2 Q2 E 
R21 A2 K2 H2 
F2 0 R4 A2 H2 



C L K E -18 
C tJ K R -18 
P L I T C D 
N E V F L 



D15 K6 T3 R2 P2 S Y G A L 

F19 D4 L3 Y2 12 A2 S 
C33 

Lll E5 ru K3 Q2 12 Y2 D2 T 

LIS Ell K2 S Q 

P2S H2 A2 I L C F 

P17 A6 V3 R2 Q L K Y F 

Yll E7 D4 A2 112 R2 V2 S I 

T17 P5 A3 R2 I S Q Y. V K 

C32 K 

P22 R6 L3 N I 

C31 T A 

K15 R4 Y2 K2 L2 

A22 G5 02 



-2 V G A r N F 



R12 



L Q 



R K D F 
K5 A2 Y3 H2 S2 F2 
121 «4 F3 L2 V2 T 
111 PiO R6 S2 K2 
R19 A7 S4 L2 Q 
F13 W I 
V14 H2 A 
F 

S 



L M T C P 



Y13 
F14 
Y32 
U2C K3 



II S 



D3 



A12 S5 Q3 P3 W3 
K16 A6 T2 E2 32 
AlB S3 K3 L2 T2 
G13 KIO .'15 Q2 R H M 
L9 Q7 K7 A2 F2 .R2 M C 
C33 

Q12 Ell lA K2 V2 Y N 
T12 P5 K4 Q3 E2 L2 
F33 

Vll 18 T3 D2 N2 Q2 
Y3 1 W2 
C27 S5 R 
G33 

C31 T A 

R13 C9 K4 Q3 D2 



L2 T2 K C R 
R2 G H V 



T N 



V S R A 



F H P R K 



BPTI 



R 
P 
D 
F 
C 
L 
E 
P 
? 

V . 

T 
C 
P 
C 
K 
A 
R 

T 
I 

R 
V 
F 
Y 

n 

A 
K 
A 

c 

L 
C 
Q 
T 
F 
V 
Y 
C 

c 
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Table 15: continued. 




Res. 
40 
4 1 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
54 
55 
56 
57 
58 
59 
60 
61 
62 
63 
64 



Number 
Di f ferent 
AAS 

3 
9 

2 ' 

3 

2 

S 

2 

9 

7 

6 

1 

7 

a 

7 
1 
8 
8 
8 
9 

6 . 

3 

2 

2 

2 



Contents 
G22 All 
N20 Kll D2 

All R9 S4 C3 H2 D Q K N 
■ N31 G2 
N21 Rll K 
Y 

£2 32 
T19 S14 
All 19 E4 T2 
E19. D6 A2 Q2 
E16 D12 L2 M 
C33 

R13 MIO L3 E3 Q2 H V 
R21 Q3 E2 H2 C2 G K 



F32 
K24 



D H V 



W2 
K2 
Q I 



R 



L2 R 
T H 



K D 



T23 A3 
C33 

C15 V8 



V2 E2 



13 E2 



I Y K 



P2 



R2 
-2 



G19 V4 A3 
All -10 P3 K3 S2 
-24 G2 Q E A Y S 
Q R I G D , 



A L 
R L : 

Y2 

P R 



-28 
-31 
-32 
-32 
-3 2 



A 

K 
R 
N 
N 
F 
K 
S 
A 
E 
D 
C 
K 
R 
T. 
C 
G 
C 
A 



m 



IS 



ii 




Q 



A 



A 
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Table 16: Exposure in BPTI 
Coordinates taken from 

Brookhaven Protein Data Bank entry 6PTI. 

HEADER PROTEINASE IKHIBITOR (TRYPSIN) ID-KAY-B? 

COMPND BOVINE PANCREATIC TRYPSIN INHIBITOR 

COMPND" 2 C/DPTIS, CRYSTAL FORM /HIS) 

AUTHOR A.WLODAWER 

Solvent radius =1.40 
Atomic radii given in Table 7 



Areas in Angstrorr.s-squa red . 



Residue 



ARC 
PRO 
ASP 
PHE 
CYS 
LEU 
GLU 
PRO 
PRO 
TYR 10 
THR 11 
GLY 12 
PRO 13 
CYS 14 
LYS lb 
ALA 16 
ARC 17 
ILE 18 
ILE 19 
AJ5G 20 
TYR 21 
PHE 2 2 
TYR 2 3 
ASN 2 4 
ALA 25 
LYS 2 6 
ALA 27 
GLY 28 
LEU 29 
CYS 30 
GLN 31 
THR 3 2 



Not Not 
Total Covered • covered 
area by M/C fraction at all fraction 



342.45 
239 . 12 
272 .39 
311. 33 
241.06 
280.93 
291.39 
236.12 
236.09 
330.97 
249 . 20 
184 .21 
240.07 
2 37 . 10 
310.77 
209 .4 1 
351.09 
277 . 10 
278 .03 
339. U 
333 ,60 
306.08 
333.66 
264 .88 
211.15 
313 .29 
210.66 
186 .63 
280.70 
238,15 
301.15 
251. 



.26 



205.09 
92.65 
158.77 
137 .62 
48.36 
151. 45 
128.91 
128.71 
109.82 
153.63 
80.10 
56.73 
13C.25 
75.55 
2C0. 25 
66. 63 
243.67 
100. 51 
146. 06 
144.65 
102.24 
70.6=; 
77.05 
99.03 
85. 13 
216. 14 
96. C5 
71.52 
132.42 
57 . 27 
141.80 
138. 17 



0 . 5939 
0 . 3375 
0. 5829 
0,44 27 
0 . 2006 
0. 5300 
0.4424 
0.5451 
0.4652 
.4 642 
.3214 
.3031 
. 54 26 
, 3136 
. 64 44 
, 3 132 
. 6940 
,3627 
. 5254 
.4266 
. 3065 
.2303 
,2275 
.3739 
.4032 
. 6399 
.4560 
.3823 
0.4718 
0. 2405 
0. 4709 
C. 5 4 99 



152.49 
47 . 56 

143.23 
4 3.21 
0.23 

115.87 
90.. 39 
99 .98 
4 5.80 
79 .49 
64 .99 
23,05 
75.27 
53 . 52 

192 . 00 
45 . 59 

201,43 
53 .95 
96 . 05 
43.31 
60.67 
23,01 
17.34 
33.69 
48.20 

202 . 84 



. 78 
. 09 



93.61 
19.33 
82 . 64 
76,47 



4453 
1989 
5253 
1388 
0010 
4 124 

3 102 

4 2 J4 
1940 
2402 
2603 
1252 
3126 
2257 
6178 

2 177 
5730 
2127 
3455 
1292 
2089 
0752 
0512 
1461 
228 J 
6474 
2601 
1713 

3 3 35 
0812 
2744 
3043 



6PTI 



II 



i 




.1 



1 



2C1 



Table 16, continued. 



PME 


33 


304 


.27 


59 


79 


0. 


196 5 


18. 


91 


0, 


0622 


VAL 
TYR 


34 


251 


.56 


109 


73 


0. 


4364 


42 . 


36 


0 


1684 


35 


332 


. 64 


80 


52 


0. 


2421 


15. 


05 


0 


0452 


GLY 


36 


187 


.06 


11 


90 


0, 


0636 


1. 


97 


0 


0105 


GLY 


37 


185 


. 28 


84 


26 


0. 


4548 


39. 


17 


0 


2114 


CYS 


38 


234 


. 56 


73 


64 


0. 


3139 


26, 


40 


0 


1125 


ARC 


39 


417 


. 13 


304 


62 . 


0, 


7303 


250, 


73 


0 


6011 


ALA 


40 


209 


.53 


94 


01 


0 


4487 


52. 


95 


0 


2527 


LYS 


4 1 


314 


. 60 


166 


23 


0. 


5234 


108 , 


77 


0 


34 57 


ARC 


42 


349 


. 06 


232 


83 


0 


6670 


179 


59 


0 


5145 


ASU 


4 3 


266 


.47 


36 


53 


0 


1446 


5 


32 


0 


0200 


ASN 


14 


269 


. 65 


91 


03 


0 


3373 


23, 


39 


0 


0367 


PHE 


45 


313 


. 22 


69 


73 


0 


2226 


14 


79 


0 


0472 


LYS 


46 


309 


.83 


217 


18 


0 


7010 


155 


73 


0 


5026 


SER 


47 


224 


.78 


69 


11 


0, 


3075 


24 


80 


0 


1103 


ALA 


48 


211 


.01 


8 2 


06 


0 


3889 


31 


07 


0 


1473 


GLU 


49 


286 


.62 


161 


00 


0 


5617 


100 


01 


0 


2489 


ASP 


50 


299 


.53 


156 


42 


0 


5222 


95 


96 


0 


3204 


CYS 


51 


233 


.63 


24 


51 


0 


1027 


0 


00 


0 


OCOO 


MET 


52 


203 


. 05 


89 


43 


0 


3054 


66 


70 


0 


2276 


ARC 


53 


356 


.20 


224 


.61 


0 


6306 


139 


75 


0 


53 27 


THR 


54 


251 


. 53 


116 


.43 


0 


4629 


51 


64 


0 


2053 


CYS 


55 


240 


.40 


69 


.95 


0 


2910 


0 


00 


0 


0000 


GLY 


56 


184 


. 66 


60 


.79 


0 


3292 


32 


78 


0 


1775 


CLY 


57 


106 


.58 


49 


.71 


0 


4664 


38 


28 


0 


3592 


-ALA 


SO 


no position given 


in Protein Data 


Bank 



"Total area" 



"Not covered 
by M/C" 



"Not covered 
at all" 



is the area measured by a rolling sphere 
of radius 1.4 A, where only the ator.3 
within tho residue are considered. This 
takes account of conf orr.ation. 

is the area neasured by a rolling sphere 
of radius 1.4 A vhcre all r.ain-chain atoms 
are considered, fraction is the exposed 
area divided by th« total area. Surface 
buried by uain-chain ator.s is nore 
definitely covered than is surface covered 
by sido group atons. 

is the area neasured by a rolling sphere 
of radius 1.4 A vhere all ator.s of the 
protein arc considered. 




le 17: PlasDidG uccd in- Detailed Example 



Phage 
LCI 

pLG2 

pLC3 
pLC4 



pLG5 
pLG6 ' 
pLG7 

ptx:8 

pLG9 
pLGiO ■ 
oLGll 



t 7nnter.t5 

MlDnplB with AV3 II/Aat II/Acc I/aS-C 
Il/Sau I adaptor 

LCI with ir2^' and ColEl of.p3R22: cloned 

into Aac II/Aa!2 I sites 

pLC2 with rSlfl I site removed 

pLC3 with firr.t part of o cp-pbd gene 

cloned into iise H/Sau I sites, 

Avr II/Asu ir sites created 

pLG4 with second part of osp-phd qene 

cloned into ^ H/Asu II sites, Hs^ I 

site created 

pLC5 with third part of gsp-pbd ocr.e 
cloned into Asu II/8fL^ I sites. Hbe I 
site created 

pLG6 with last part oZ ot:p-^i?.i 
cloned into nriS I/Asu II sites 
pLG7 vith disabled oso-phd qene, Su-e 
length DHA. 

pL07 nudtcd to display BPTI ( V IS^pxr ) 
pLC3 + cex^ rjcnc - an2^ gene 



pIvCO + te^ 



.ir.o ^ gene 



c 




T 

r 

/ 
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Tabic 13: Enzy-e sites elininated when 
M13r.pl3 is cut by Ava II 
and Br.u3 6 I 



10 



15 



20 



25 



30 



35 



40 



Aha 11 
Fsp I 
EcoR I 
Sna I 
Hind III 
Hi nd II 



Aat II 
Bbv II 
BstS I 
EC057 I 
Esq I 
Khe I 
Pf IM I 
Rsr I 
Spe I 
' Xca I 



Kar I 

Bai I 

Sac I 
BapH I 
Acc I 



Cdi II 
HQ IE II 
Kpn I 
Xba I 



i:vu I 
Dsu:6 I 
X.-.a I 
Sd !■ I 



Table 19: Enzyr.es noc cutting 
M13mpl3 



Afl I 
Scl I 
BstE II 
EcoK I 
Hpa .1 
N'ot I 
Pmac I 
Sac I 
Stu I 
Xho I 



A pa I 
DspM I 
PstX I 
ECOO109 I 
Mlu I 
r.'ru I 
Ppa I 
Sea I 
Sty r 



SlisK I 
E^g I 
Ecc.R V 
Mco I 

FpuM I 
Sfi I 
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Tabic 20: Enzynos cutting 
Ado R gene and ori 



10 



15 



Aat II 
Sea I 
Pvu I 
Hind II 
Kde I 



Iby II 
TthXU I 
FGB I 
Pst I 



Eco57 I 
Aha II 

ml I 

Xba I 



Ppa I 
Gdi II 
HoiE II 

Afi rir 



m 



m 




Tab Is 2 1- Enr.yr.c 



-^ Acc I 

ASU II 
A%M III 



Kecoqnitiion 
CTMKAC 
CTTAAG 
CCCCCC 
TTCGAA 
ATGCAT 



10 



15 



25 



";0 



3 5 



Avr II 
33 -H I 
Bci I 

BssH II 
+ Bs^ II 
VBS^X I 
-Ora II 
I 

£ccR I 

+ ESD I 

Hind III 
Hca X 
Kpn- I 



Ijar 



£r^ I 
^Poi^ I 
+Rsr II 
Sac I 



CCTAGG 
CGATCC 
TCATCA 
TCCGGA 
GCGCGC 

cgt;:acc 

CCANNSNM 

RGGNCCY 

CCTNNN'Ni' 

G^AT rc 

GATATC 

CCTNACC 

AAGCTT 

CTTAAC 

GGTACC 

ACGCGT 

GGCGCC 

CCATGG 

GCTAGC 

GCGGCCGC 

TC3CGA 

CCANNNNN 

CACGTG 

RGG-^-CCY 

CCGoCCG 

GAGCTC 

GTCGAC 

cct::acg 



-rsfi I CGCc::t:!:::NGGCC 



;5 



I 

Soe I 

Son ^ 

*?SCV I 

I 



:CCGCG 
ACTAGT 
GCATGC 
ACGCCT 
CCV^GG 
GTATAC 



s tQ5-ed on Acciq CN'A 



cuts 
2 i 

1 i 
5 i 

2 i 

5 & 



I 
I 

; 

L 

1 
I 

8 
?. 
5 
1 
3 
2 
1 

: 

5 

I 
2 
1 
I 



Supply 

4 <B.M.I,N,?.T 

5 <M 

1 <M, ?,T 
4 <?,:i{5stB I) 
1 <T; N5_k t:M.:f,P.T; 
^c3t:: i:t . 

<S.3,M,I.S*,T 

<:i.T 

<S,3,M.N,T 

( soon) 
<S.3,M, I.N.P.T 



3 
7 
3 
2 
2 
5 

1 
2 

3 
3 
1 
5 
3 
1 



5 
3 
5 

5 
6 
4 
5 
o 

5 

3 

5 

5 

3 

1 

5 
4 
5 
5 

u 
3 



<S.3,M.I.'^.?'T 
.:S, 3,M. r .N, ? 

<s. 3,.^.. I.N. ?,T ; 

3 <ncne 
5 --N 

5 <N,r 

I <3(S5t I) ,M,I.N,?. 



3 i 



5 <B.M.:.:i,?.T 

5 Cvn 1:3: I 

:T; D^u:^ t:N: Aoc I:: 

<3.r--, I.N'. P'"^ 

<-.NMJArt I) 
<N (seen) 





0 



2G7 

Table 22: lobri gene 



pbd modlO 20III68 : 

lacOVS Rsr II/Ayj: II/genc/XllSA attp.nuator/Mst ri; ! 
5'- CGCdCCG TaT " ! £sr II site 

CCAGGC tttaca CTTTATGCTTCCGGCTCC ■ tatafit GTG " I lacUVS 
TGG aATTGTGAGCCCATAACAATT ! 
CCT AGCAqq CtcaCT ' 
atg aag aaa tct ccg gtt ctt aaq get age I 
gtt get gtc gcg acc ccg gta ccg atg ctg ! 
tct ttt get cgt ccg gat ttc tgc etc gag 1 
ccg.eca tat act ggg ccc tgc aaa gcg cgc ! 40 
ate ate cgt tat ttc tac aac get aaa gca 
ggc ctg tgc cag ace ttt gta tac ggt ggt 
tgc cgt get aag cgt aac aac ttt aaa teg 
gee gaa gat tgc atg cgt acc tgc ggt ggc 
gee get gaa ggt gat gat ccg gcc aaa geg 
gee ttt aac tct ctg eaa get tct get acc 
gaa tat ate ggt tac gcg tgg gee atg gtg 
gtg gtt ate gtt ggt get acc ate ggt ate 
aaa ctg ttt aag aaa rtt act teg aaa gcg 

Bst^ II 



lacQ opersator 
Sh ine-Da li^a rno seq. 
10. m;3 leader 
20 
30 



50 

60 

70 

30 

90 

100 

110 

120 

130 



tct taa tag tga gnttacc ! 
agtcta agcecge ctaatga gcgggct tttttttt 
CCTgAGG -3' ■ ! Mst II 



tem inator 



c 




i 



/I 



i 



.'I 



3 



9 



O 



Table 23: i pbd DMA sequence 

DMA Sequence file « UV5_M1 3 PTIMl 3 . DNA : 17 
DNA Sequence title = 
pbd modlO 2911183 : Uc-UV5 Rs r t I/Avr I I/geno/TrpA 

a ttcnuator/MstI I ; ! 

1 C|GCA|CCClTATlCCA|GCC|TTTl ACA I CTT I TAT I CCT I TCC I GCC I TCC I 
4 I TAT I AAT | CTC | TGG | AAT [ TGT | C AG | CGC [ ATA | ACA | ATT | CCT | ACG t ACC I 
8 3 CTC I ACT I ATG | AAG | AAA | TCT | CTG | CTT 1 CTT | AAG | CCT | ACC I GTT | GCT I 
12 5 GTC I CCG I ACC | CTG | GTA | CCC \ ATC [ CTG [ TCT | TTT | CCT | CGT | CCC ! C AT ! 
167 TTC I TGT | CTC | GAG | CCC | CCA | TAT | ACT { GCC | CCC ! TGC 1 AAA | GC3 | CCC i 
209 ATC I ATC | CGT | TAT | TTC | TAC 1 :'^-AC | GCT | AAA | GCA | GCC | CTC j TGC | CAG j 

2 51 ACC I TTT-l GTA I TAC | GGT [ GGT | TCC i CGT | GCT [ AAG | CGT | AAC | AAC ; TTT i 
29 3 AAA|TCG|CCC|GAA|GAT|TGCI ATC;CGT| ACC|TGC|GCT|CGC(GCCjGCT! 

3 35 GAA|GGT|GAT|CAT|CCGiGCC! AA.\iGCG;GCC|TTT| AAC|TCT!CTC|CAA| 
3 7 7 GCT I TCT | GCT | ACC | C/JV | TAT | ATC | CGT | TAC | GCC | TCC | GCC \ ATG \ GTC I 

4 19 CTC I GTT | ATC | CTT | CGT | CCT | ACC | ATC | GGT | ATC | AAA I CTG j TTT | AAG | 
4 61 AAA|TTT| ACT|TCG|AAA|GCC|TCT|T.\A|TAG|TGA|CGTiTAClCAC|TCT! 
50 3 AAG I CCC | GCC | TAA | TGA | CCG | GCC j TTT | TTT | TTT | CCT j C AG | G 

Total = 539 bases 



El 

|i 
P 




0 



Q 
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Tabic 2-;: Sur.r.ary of Restriction Cuts 



Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 
Enz 



%Acc I has 
Acc in ha 
Acv I has 
Af 1 II has 
% Af I III has 
Aha III has 
Aua I has 
Asd7 13 
Asu II 
^Ava I 
Avr II 

Bbe I 
^Bal I 
+ B in I 



has 
has 
has 
has 
has 
has 
has 
has 



X BsvM I has 
BssH II has 
^- estE II tis 
\ 3stX I has 
Cfr I has 
+Dra II has 
^ Zso I has 



1 obscir/ed sites : 259 
1 observed sites : 162 
I observed sites : 328 
1 observed sites : 109 
1 observed sites : 404 
1 observed sites : 292 
1 observed sites : 
1 observed sites 
; observed sites 
1 observed sites 
1 cbser^'ed s i tes 
3 observed s ices 
1 observed sites : 
1 obccir^ed sites : 
1 observed s ites : 
1 obser--'ed sites 
1 obser/ed sites 

1 observed sites 
1 obse r*^ed sites 
obser'vod s ites 



193 . 
133 
471 
175 
76 

138 323 540 
328 
352 
346 
: 319 
: 205 
493 
4 13 
299 350 



1 obsen/ed sites 
^ Zso I has 1 observed sites 
VFok I has 1 observed sites 
Gd i II has 2 obser-v'ed sites 
Hae I has 1 observed sices : 
Hae II has 1 obser^/ed sites : 
*HTTa I has 1 obser-zed sices : 
X H^iC I has .3 obser/ed sites 

1 cbseiTvcd sites 
1 observed sites 
obser^'ed sites : 
1 observed sites : 

2 obser"/ed sites 
1 observed sites : 

has 1 cbser/ed sites : 
has 1 obscr/ed sites : 
has 1 observed sites : 
has 1 obcor^/ed sites : 

s 1 obser'/cd sites 
1 observed sites : 
1. obseo.'ed sites 
1 observed sites : 

1 obscr-/ed sites 
1 observed sices : 

2 obsc rved sites 
L observed sites : 

observed sites 
observed sices 



^HqiJ II has 
Hind III , has 
-r Hch I has 
>'pn I has 
+51bo' II * has 
Hlu I has 
Nar I 
Nco I 

rrhe t 
Nru I 
NsDf7524) ha 
?;scD II has 
has 
has 

has 
has 

has 
has 



: 193 
277 
213 

299 350 
240 
323 
473 

: 1:3 323 540 
: 103 
: 377 
340 



+ Pf IM I 
- Fss I 
+Rsx II 
+ S3jJ I 
iSfaN I 
+ Sli I 
Soh I has 
Stu I has 
\ Stv I has 



138 
: 93 
404 
328 
413 
115 
123 

: 311 
332 
: 13 4 
193 



304 



2 observed sites 



535 
: 144 209 

351 
311 
240 

76 4 13 



• 



As* 

m 

m 

m 



m 



i 



i 
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Table 25: Annotaccd Sequence of jpbd gene 

5'- C|GCA|CCC|TAT|CCA|GCC|TTT(ACAlCTT|TAT| 
• I Rsr ri I J ^11 L 



28 



✓ 



I GCT I TCC I CCC I TCG I TAT | AAT | CTG I TCG I 

I -io L 



I .\AT 1 TGT ( GAG | CCG 1 ATA | ACA 1 ATT I 
I U^c operator __i 



I CCT ! AGG I ACG | CTC ] ACT | 
I Avr Til 



52 



73 



88 



t S. D. I 



m 


k 


k 1 s 


1 


V 


1 


k 


a 


s 


1 


2 




5 


6 


7 


3 


9 


10 


ATG 


AAG 


AAAj TCT 


CTC 


CTT 




A.-\G 


CCT 


AGC 










Afl TT_ 





118 



I 1^1 12 
|GTT^ GCT 



V 


a 


t 


1 


V 


13 


14 


15 


16 


17 


CTC 


GCG 


ACC 


CTG 


CTA 



N ru I I 



19| 20| 



P 

1 3 I f j 
CCC I ATG 1 CTC ■ 



1 Kun II 



s 


i 


a 


r 


P 


21 


22 


23 


24 


25 


TCT 


TTT 


Gcr 


CCT 


CCG 



d I f I c 
26] 27t 23 
CAT I TTC I TCT 
I Acciri i 



I 


^ 1 


29 


30i 


CTC 


GAG 1 


Av.-i T I 


xho r ! 



173 



p 

31 
CCG 



P 

32 
CCA 



y 

33 
TAT 



t 
34 
ACT 



9 
35 
GGG 
L 



P C 
36 I 37 
CCC I TCC 



k a 
33 I 39 
AAA 1 GCG 



^ I 
40] 
CCC 



I Ana r I 
I Dra II \ 
I Pss t I 



203 



1 


i 


r 1 y 


f 


y 


n 


A 


41 


42 


4 3 44 


45 


46 


47 


48 


ATC 


ATC 


CCT 1 TAT 


TTC 


TAC 




OCT 



k 

49 I 
Aj\A I 



235 





272 



a 

SO 
CCA 



q 

51 

ccc 



I I c 
52 53 
CTC I TCC 

JU. 



Table 


25, 


con'- inucd . 


q 


t 


f 


V 


y 


q g 


54 


55 


56 


57 


53 


59 1 60 


CAC 


ACC 




CTA 


TAC 


CCT|CGT 








. ACC T 










Xca T ! 
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c 
61 
TCC 



70 
TCC 



r 

62 
CGT 



. v. 

61 I 64 
GCt| AAG 



r 

65 
CGT 



n n 

661 67 
A^\C I fsAQ 



71 
CCC 



e d 
72 73 

caa|cat 



c 
74 
TCC 



75 

atc 



r 

76 
CGT 



t 

77 
ACC 



k 

69 



C 
78 
TCC 



Snh T I 



79f 
GGTI 





a 


a 


e 


g 


d 


d 


BO 


81 


B2 


83 


84 


85 


86 


CCC 


CCC 


CCT 


CAA 


CGT 


GAT 


CAT 


Bbe X 












Mar r 













205 



325 



346 



P 
87 
CCC 



a I k I a 
881 39 90 
GCC| /w^AiGCG 

sfi r 



a 

91 
CCC 



f 


n 1 s 


1 


q 


a 


s 


a 


t 


92 


93 94 


95 


96 


97 


93 


99 


100 


TIT 


AAC 1 TCT 


CTG 


CAA 


CCT 


TCT 


CCT 


ACC 



jltir.d 3 [ 



361 



3U8 






-> ^ 
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Table 25, continued. 



i 


V 


g 


a 


113 


114 


115 


116 


ATC 


GTT 


GGT 


OCT 


k 


1 


f 


.Jc 


121 


122 


123 


12-: 


AAA 


CTC 


rv'V 


/xAC 


s 








131 


132 


13 3 


134 


TCT 


TAA 


TAG 


TCA 



c 

117 

ACC 



113 
ATC 



k I f 
1251 126 
AAA! TTT 



g ^ 

119i 120 
GGT ATC 



t I s j k I a I 
127 128 12Q| 130 
ACT I TCC ! AAA I CCC I 
■Asu ir! 



448 



GGTjTACj CAC|TCTl 

bsiie: Elt 



AAG| CCC I CCCI TAA iTCAl CCC I CCC I TTT! TTT I TTTI 
J Trp terr.inatcr -J 



CCTlCAG|C -3' 
Sail T I 



ote the following^ enzyne equivalences. 



= sspm i: 

= ECOO109 I 
= Bsc 3 I 
= 3S U36 I 



502 



532 



539 






21 A 



rable 26: 0:JA_seql 



5 ' I ccg I tec I g tC I CCA | CCC | TAT | CCA | GCC | TTT | ACA | CiT j TAT 1 
I spacer I Rsr II I I -35 



t CCT I TCC I CCC I TCC I TAT | AAT I CTC | 7CC | 

' I "10 I 



I AAT i TGT I CAC | CCC j ATA | ACA | ATT | 
I lac operacor L 



j CCTl ACG| 
I Avr ll \ 





k 


a 


128 


129 


i:o 


TCC 


/\AA 


CCC 



Igcc(gctlccT 
spacer [As-j TI| 



s 








131 






134 


TCT 




TAG 


TCA 



CCT ! TAC I CAC I TCT | 
Bsr:£ Hi 



I AAC I CCC I CCC I TAA j TCA \ CCG ( CCC | TTT | TTT | TTT | 
I Trp terninatcr [ 



|CCT|GAG|Cc;\|ggt|gag|cg - 3' 
I Sau I I spacer [_ 



li? 
P 




V r 



* 



/ 
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Table 27: Dt/A_synthl 
5' I CCG I TCC t err ( r,G A I CCC I T AT ! CCA ! CCC t TTT I ACA I CTT I TAT I 
I GCT I TCC I c r:C I TCC i TA T ( AAT I CTG I TCC I 



! AAT I TCT t G AG I CCCi AT A ■ ACA I ATT ! 



[CCTl AGGi 
gga tec 



/ 3' = oliqn 
i CCC I GCT ! CCT I TCC ■ AAA | CCG { 
cqq cga qqa age ttt cqc 



I TCT ( TAA I TAG t TGA | GCT \ TAC | CAC | TCT 1 
aga'atit ate act cca atg gtc aga 



I AAG i CCC I CCC I TAA | TGA [ GCC \ GGC | TTT | TTT ] TTT \ 
ttc ogg egg att act cgc ccg o.aa aaa aaa 



( CCT I GAG I GCA | GCT | GAG ■ CO 
gga etc cgt cca ccc gc - 5' 



"Top" strand 00 

"Bottom" strand 100 

Overlap ' 23 (14 c/g and 9 a/ t) 

IJct length 158 



I ■■ 
r 





-i 



J 



■0 



277 

Table 29: DNA_synth2 
5'- JC^CAlCCAl ACC I. 
Irr jl Ar.r:(AGG[C TC|ACTl 

|ATCl AAGI A.AA|TCTlCTGlGTTlCTT'AAC.lGC T|AGCl 

I GTTi CCT I GT f^ I f^CG 1 ACC I CTG I GTA I CCG t ATG 1 CTG | 
oligfiS = 3'- ggc tac gac 

/ 3' = oligi;5 
jjv^rjxT ^ i ^crr I cgt I ccc [ cat ] ttc | tgt | ctc [ gag | 
aga aaa "cga gca ggc eta aag aca gag ctc 

I CCG 1 CCA I tat I ACT [ CCG | CCC 1 TCC | A.^J\ 1 GCG i CCC | 
ggc ggt ata tya ccc gag acg ttt cgc gcg 



|ATC| ATCiCCTi 
tag tag gca 

I ACT t TCC I ,\.\A \ GCG 1 GCT I GCG | 
•tga age ttt cgc cga cgc - 5' 



"Top" strand 
"Bottom" strand 
Overlap 
Net' length 



99 
90 

24 (!•; c/g and 10 a/t) 
155 



if- 
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Table 30: DNA_seq3 





a 


r 




39 


40 


5'- 1 ccc 1 tgcl aca 


GCG 


CCC 







i 


i 


r 


y 


f 


y 




a 


k 








42 


43 


44 


45 


46 




48 


49 






ATC 


ATC 


CGT 


TAT 


TTC 


TAC 


A.\C 


CCT 








a 


g 


1 


c 


q 


t 


f 


V 


y 


g 


q 


50 


51 


52 


53 


54 


55 


56 


57 


58 


59 


60 


CCA 


CCC 


CTG 


TGC 


CAG 


ACC 


TTT 


GTA 


TAC 


GCT 


GCT 


I 


Stu 


Xi 










ACC I 




















Xca I 






c 


r 


a 


k 


r 






f 


k 






61 


62 


■63 


64 


65 


66 




68 


69 






TGC 


CGT 


GCT 


AAG 


CGT 


AAC 


AAC 


ri'T 


AAA 







Esp I . 



s 

70 
TCC 



7l| 72 
GCcjcAA 



IXmalllf 



d 
73 
GAT 



c 

74 
TGC 



75 
ATG 



76| 77 
CCT ACC 



C q 
73 79 
TCC GCT 



Suh I' 



q 


a 




60 


81 




CGC 


GCC 


get 1 gaa 


Bbe r 


soacer 


N^r X 







t 


s 


k 




127 


128 


129 


ttt 


acT 


TCG 


AAa 



|Asu III 



gcg| tcg|ccg| 



m 




0 



k 



1 



/I 



1 



i 

i 



279 

Table 31: Dr/A_synth3 
^ - I rcc I TGC I ACA I CCG I CGC I 
I ATC ! ATC I CCIT ! TAT ■ TTC l TAG ' A AC I CCT I AAA I 



i CCA I GGC 1 CTG I TCC ' CAC ) ACC 1 TTT ! CTA I TAC I GGT ! GGT I 
oiiq?3 = 3'- g cca cca 



/ 3' = olig?l7 
I TGC I CCT I r.CT I AAG ! CGT I AAC I AAC ) TTT [ AAA | 
acg gca cga ttc gca teg ttg aaa ttt 

I TCG I GCC I GAA 1 CAT l,TCC 1 ATC j CCT | ACC 1 TGC | GGT | 
age egg etc cca acg cac gca egg acg cca 

I GCC) GCC [CCT I GAA I 
ccg egg cgt czz 

I TTT I ACT 1 TCG | AAA | GCG j TCG | CCC I 
aaa.tqa age ttz cgc age ggc -5' 



"Top" strand 
"BottoQ" strand 
Overlap 
Net length 



93 
97 

25 ( 15 g/c & 10 a/'t) 
146 



& 

m 
m 



pi 





Q 



Table 32: DNA_seq^ 









9 


a 


a 


e 


g 


d 




5' 






80 


81 


82 


S3 


84 


85| 86 


1 cct 


lege 


cct 


GGC 


CCC 


OCT 


GAA 


CGT 


CAT 


GAT 


snacer 


Sbe T 


















Mar r 














P 


a 






a 












87 


88 


89 


90 


91 












CCC 


CCC 


AAAjGCC 


CCC 












1 


sfi r 
















f 


n 


s 


1 


a 


a 


s 


a 


t 




92 


93 


94 


95 


96 


97 


98 


99 


100 




rri" 


AAC 


TCT 


CTC 


CAA 


GCT 


TCT 


CCT 


ACC 












|H 


ind 3 1 







e 
101 
GAA 



y 

102 
TAT 



1 

103 
ATC 



104 105 
GGT I TAC 



a 
106 
GCG 



V 

107 
TGG 
I 



I a a 
j ICS I 109 
|gCC| ATG 

BstX 



V 

110 
CTC 



ml 112 

GTG I CTT 



Nco I 1 



i 


V 


g 


a I t 


113 


114 


115 


116! 117 


ATC 


GTT 


GGT 1 GCT 1 ACC 


Jc 


1 ■ 


f 




121 


122 


123 


124 j 125 


AAA 


CTC 


TTT 


AAC { .\A^ 



1 

118 
ATC 



g I i I 

119| 120 
CCT [ATC I 



f I t I S I k I 
126 1271 123 129 
TTt| ACTlTCci AAaiqcgl tcqiggc 
|Asu TTj spacer 




fas 



m 



m 



H 
m 





23 1 



Taiale 33: D::A_synth4 
5' I GCT I CGC i err I f:r:c I gc c ( cct I c\a t CCT I cat ! c at ! 

|c:CG|CCCl AAAfCCGlCCC! 

I TTT I AAC I TCT \ CTG | CAA I G CT ' TCT ' CCT [ ACC j 

I GA A 1 TAT ' AT C I GGT I TAC ' CCG I TGG I 
oLigSlO = 3'- Ota tag cca otg cgc acc 



/ 3' = oLiq§9 
|GCCI ATGlG TC | GTC | GTT | 
egg tac cac cac caa 



I ATC I GTT I GCT | CCT j ACC | ATC | CCT | ATC i 
tag caa cca cga tgg tag cca tag 

I AAA I CTG I TTT I AAC I AAA I TTT I ACT I TCG ; AAA I CCG ; TCT i TCA | 
ttt gac aaa ttc ttt aaa tga age ttt cgc aga act - 



"Top" strand 100 

"Bottom" strand 93 

Overlap 25 

Net length 149 



(14 c/g and 11 a/t) 




A 






Res . 
i 



-3 
-4 
-3 
-2 
-1 

1 

2 

3 

4 

5 

6 

7 

8 

9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 ^ 
.30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
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Table 34: Some interaction sets in BPTI 



Numbei 
Dif f . 



Contents 



BPTI 1 2 3 4 5 



2 

- 2 

5 
10 
10 
10 

9 
10 

7 

1 
10 

5 

7 

9 
10 
10 

2 

5 

3, 
12 

7 
12 

6 

7 

5 

A 

6 

2 

4 
10 

9 

5 

7 
10 

1 

7 
11 

1 
11 

2 

3 ' 
1 
3 
7 



0 -32 
E -32 

T P F Z -29 

23 R3 Q2 T2 H G L K E -18 
D4 T2 P2 Q2 E C N K R -18 
R21 A2 K2 H2 P L r T G D 
P20 R4 a: H2 N E V F L 



Y C A L 
S 



Q2 12 Y2 02 T 
Q 



D15 K6 T3 R2 P2 S 

F19 D4 L3 Y2 12 A2 
C33 

Lll E5 »4 K3 
L18 Ell K2 S 

P26 H2 A2 I L G F 

P17 A6 V3 R2 Q L K Y F 

Yll E7 04 A2 N2 R2 V2 S 

T17 P5 A3 R2 I S Q 
G32 K 

P2 2 R6 L3 U I 
C31 T A 



I 0 



Y V K 



K15 R4 Y2 M2 L2 -2 

A22 G5 Q2 R K 0 F 

Ri2 K5 A2 Y3 H2 S2 F2 

121 M4 F3 L2 V2 T 

111 PIO R6 S2 K2 L Q 

R19 A7 S4 L2 Q 

YIB F13 W I 

F14 YI4 H2 A N S 

Y32 F 

M2C K3 03 S 

A12 S5 Q3 P3 W3 L2 T2 

K16 A6 T2 E2 S2 R2 G 

A18 S8 K3 L2 T2 

G13 KIO H5. Q2 R II H 
L9 Q7 K7 A2 F2 R2 M G 
C33 

Q12 Ell L4 K2 V2 Y U 

T12 PS K-; Q3 E2 L2 G 
F3 3 

Vll 18 T3 

Y31 W2 

C27 S5 R 
G33 
031 



V G A I N F 



L M T G P 



K G 
H V 



T N 



V S R A 



02 N2 Q2 F H P R K 



T A 



R13 G9 K4 Q3 02 P M 



R 
P 
0 
F 
C 
L 
E 
P 
P 
Y 

C 
P 

c 

K 
A 
R 
I 
I 
R 
Y 
F 
Y 
N 
A 
K 
A 
C 
L 
C 
Q 
T 
F 
V 
Y 
G 
G 
C 
R 



X 

s 5 
4 s 




40 
41 
42 
43 
44 
45 
46 
47 
43 
49 
50 
51 
52 
53 
54 
55 
56 
57 
53 
59 
60 
61 
62 
63 
64 
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Table 34: continued. 



Number 
Res. DiCf. 

j AAs Contents 



G22 
H20 
All 
N31 
U21 
F32 
K2 4 
T19 
All 
E19 
E16 
C3 3 
R13 
R2I 
T2 3 
C33 
G15 
C19 
All 
-24 
-23 
-31 
-32 
-32 
-32 



All 

Kll 

R9 

G2 

Rll 

Y 

E2 

S14 

19 

D6 

D12 



D2 

S4 G3 H2 D Q K N 
K 

S2 D H V Y R 

E4 T2 V/2 L2 R K D 
A2 Q2 K2 T H 
L2 M Q K 



MIO L3 E3 Q2 H V 
Q3 E2 H2 C2 C K D 
A3 V2 £2 I Y K 

V8 13 E2 R2 A L S 

V4 A3 P2 -2 R L U 

-10 P3 K3 S2 Y2 R 

G2QEAYSPR 

Q R r C D 

T P 

D 

K 

S 



BPTT 12 3 4 5 



S 5 
4 s 
s 5 
s 
s 

5 

5 
5 



indicates secondary set 

indicates in or close to surface but buried and/or 
highly conserved. 






I. 



4 



Distances in Angstrons between C^Q^a^* 
Kypothfeticai C^^ta. '■'^^ added to each Glycine, 



119 
y21 
A27 
C23 
L29 
Q3L 
T32 
V34 
A-i 8 

E-;9 
y.52 



?9 

Til 

K15 

AIS 

113 

R20 

r2 2 

N2 4 

K26 

C30 

F33 

Y35 

SA7 

D50 

C51 

R53 

r:9 



R17 
V . 7 

15. 1 
22.6 
26. 6 
22.5 

16. 1 
11.7 

5.6 
13.5 

22 . 0 

23 . 



119 Y21 A27 C28 L20 Q3 1 T32 



/3 4 A4a 



14 . 0 
9 . 5 
7.9 
5.5. 
6. 1 
10. 6 
15.6 
19.9 
24 . 4 
^13.9 
10. 3 
3 . 4 
17.6 
20.0 
13.9 
25.4 
15.4 



3.4 

17 . 1 
2 0.4 
15.8 
10. 4 

5.2 
6.5 
11.0 
14.7 

11.3 
11.2 
14 . 6 
10. 1 

6 . 0 
5.. 3 

10.9 
14 .7 
20.1 
12.1 

7 , 4 
7 . 4 

10.6 
13.6 
12 . 2 

18 . 6 
16.9 



12.2 
13.3 
9.6 
6.8 
6.1 
11.6 
5.4 
8.9 
e . 6 



5. 



6.3 
12.0 



16.9 
12 . 2 



5. 
10. 

15 . 
21. 
13 . 

16 . 
10. 



2 

6 6, 
5 10. 

7 13 . 
3 3. 
1 12. 
3 7, 



3 

9 5.4 
0 11.4 
4 8.3 
2 13.9 
6 11.: 



3 . 2 
g . 3 
13 . 3 
13.2 



15.7 
19 . 3 

:o ■ 0 



5.5 
^ . 2 



9.0 
13.5 
20.1 
15.9 
11.2 
5,4 
5.6 
9.4 
15.2 
4.6 
7 . 7 
9.4 
6.5 
7 .2 
4.0 
11.0 
17.1 



12.2 
13.3 
27. 4 
25.2- 
21.3 
16.0 
10.5 
4 . 1 
5.4 
8.3 
12.6 
IS . 4 
17.3 
17.2 
12 . 1 
17.2 
24.9 



15. 
22 . 
31 . 
23. 
24 . 
13 . 
12 . 

7 . 

7 . 

9. 
16. 
21 . 
17. 
16, 
12 . 
15, 
27 . 



4 13. 

5 19. 
3 27. 
5 24 . 



20 
1 4 



3 10 
3 6 



9. 

13 . 
17. 
13. 
13 . 

3 . 
13. 



3 7.9 
3 13.5 
9 21.4 
6 IB. 6 
2 14.7 
6 9.8 
0.2 
4 .3 
10. 1 
5.9 
6.6 
12.2 
12.6 
13 .5 
8.3 
15.7 



9.2 
12.1 

13 . ). 

14 , 5 
10 . 4 

6 . 9 
3 . 1 

10 .0 

15.3 
3.2 
5.6 
9.5 

10 

12 
9 . 7 

16.7 



! .9 



8 . 7 
5 . 7 
10. 3 
3 . 6 
7.0 
7 . 3 
1C.3 
14 . 7 
19.0 
14 .9 
5.5 
5.3 
15.9 
17 . 6 



22 



2 24.9 20.1 ia.7 13.3 



13.9 
18.5 
24 - 6 
19. 3 
15.0 
10.2 
10. 3 
11.4 
17 . 0 
4.9 
12.2 
14 . 4 
5.3 
7 . 6 
5. 4 
9.7 
22 . 3 





Distances m 
Hypothetical 
E49 M52 
.tlS2 



236 

Table 36, continued. 
Angstroms between C^^^a^* 
^beta added to each Glycine. 

P9 Til K15 A16 118 R20 F22 IJ24 



P9 

Til 

K15 

A16 

118 

R20 

F22 

1*24 

K26 

C30 

F33 

Y35 

S47 

D50 

C51 

R53 

R39 



C30 
F33 
Y3 5- 
S47 
D50 
C51 
R53 
R39 



17.7 15.5 

22.1 21-5 

27.5 28.7 

22.2 24.2 
17,4 19.5 
13.0 13.8 

13.8 11.4 

15.6 11.2 



20, 
8 , 
16, 
17. 
4 . 
5. 
7, 
6. 



15.7 
5.6 
15.4 
17.3 
9.1 
7.7 
5.4 
5.6 



23.9 24.0 



K2 6 C3 0 
12.4 

13.9 10.1 
19.5 13.5 6.4 

21.0 8,8 13.5 13.2 

20.1 8.6 14.3 13.7 
15,0 3.7 10.9 12.5 
19.9 9,9 18.2 18.8 



7.2 
16.4 9.5 
14.9 9.8 6.2 

12.2 9.5 10.4 4.9 

8.0 9.4 14.9 10.6 6.2 

4.1 10.6 19. 1 16.3 12.7 6.9 
3.4 15.3 24,1 21.9 18.2 12.7 6.6 

12.1 18.6 27.9 26.6 23.3 IB.l 11.6 

10.6 16.6 24.1 20.2 15.7 9,8 6.8 

4.2 7.1 15.0 12.3 9.6 
7,8 5.8 11.0 7.6 4.9 

15.3 18.5 23.1 17,6 12.8 

14 .7 18. 6 24.2 19.2 14 .7 
11.0 16.4 23.5 19 . 2 14 . 6 
17.9 23.1 29.6 24.3 20.3 15.0 13.8 15.5 
13,0 9.5 12.0 11.3 12.5 12.8 14.7 20. S 

F33 Y35 S47 D50 C51 R53 



5.9 
6.9 

6.1 5.6 9.3 
4.3 3.8 14.8 
9.1 12.0 15.3 
9.9 11.0 14.7 
8.7 6,9 9.6 



5.0 
6.9 
9.4 



5.2 
5.3 



7,4 



24,3 20.6 14.4 9.6 20.4 19.0 13.8 23.4 



m 



w 

i 

K 
m 
m 



m 



i 




( 




i ^ 




Taoic 37 



vgDCA tc vary 



3?Tr sec "2.1 







P 


= 1 




a 1 X 






36 




k 


39 1 40 


5'-ICACl CCT 


GGC 


ccc 


TCCi 




CCC!QfV. 


1 spacer 





i 


X 


r 


■ y 


f 


y 


n 1 a 


^ 1 




41 




43 




45 


46 


47 t 43 


.!?! 




ATC 




CGT 


TAT 


TTC 


TAC 


AACicrTT 


















/ : 


' =f olig 


= 27 


+ 




+ 








1 * 








g 




C 


q 




f 1 


y ! g 


q 


50 


51 


52 


53 


54 




561 57 


53 1 59 


60 




GGt 




TCC 


CAG 


ACC 


TTctqfk 


TAC 1 CCT 


CGT 


oligs 28 = 


3 


acg 


gcc 


egg 


aag **n 


atg cci 


cca 



72 nts 



20S 



235 



268 



78 nts 

Overlap = 12 (7 CC 



5 AT) 



c 


r 


a 


k 


r 


n 


n 


M )^ ] 




61 


62 


63 


64 


65 


66 


67 


63 1 69 




TCC 


CCT 


GCT 


AAC 


CCT 


AAC 


AAC 


TTT 1 J^A ! 


295 



acg gca cga ttc gca ttg t-g .laa c-:: 
' ESP T 1 



s 

70 
TCT 



X j e 
7li 72 
qfk|GAG 



d 
GAT 



c 
74 
TCC 



ATG ! < 



322 



age etc eta acg tac gca ccc acc 



k = equal parts of T- and G; - = eq'jal parts of C and A; 
q = (.26 T, .13 C, .26 A, ar:d .30 C); 
f = (.2.2 T, .16 C, .40 A, and .22 C); 
* = complement of symbol above 



Residue 40 42 50 52 

Possibilities 21x21>:21x*21 
Abundance x 10: 

of PPBD .763 .271 .459 .671 

Produce = 1.77 x 10" 



^-3 
i7 



57 
21 



. COO 



21 = 8. 



;59 



6 X 10' 



Parent = 1/(5.5 x 10') Ica^t favored = 1/(4.2 x 10') 
Least favcrcd one-amino-ac id substitution from PPBD present 
at 1 in 1.6 :< lo'' 





i 



r-. 
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Table 38: Result of varying sct#2 of BPTI 2.1 



p 

31 
CCC 


P 
32 
CCA 
Pf 


i 


Q 


41 


42 


ATC 


CAC 


E 


g 


50 


51 


GAG- 


GGC 


C 


r 


6i 


62 


TCC 


CGT 


s 


W 


70 


71 


TCC 


TGG 


g 


a 


80 


81 


CGC 


GCC 


B:>e r 


Nar I 



y 

33 
TAT 



t 

ACT 



r 
43 
CGT 



L 

52 
CTG 



63 
CCT 



y 

44 
TAT 



c 
53 
TGC 



64 
AAC 



ESD I 



e 

72 
CAA 



d 
73 
GAT 



35 
GCC 



P 

36 
CCC 



f 

45 
TTC 



q 

54 
CAG 



r 

65 
CGT 
1 



c 

74 
TGC 



y 

46 
TAC 



t 

55 
ACC 



n 

66 
A/iC 



m 

75 
ATG 



37 
TGC 



Aoa 


I 1 


Dra 




Pss 





n 

47 
AAC 



f 

56 
TTT 



n 

67 
AAC 



r 

76 
CCT 



I SDh Tl 



k 

33 



a 

43 
CCT 



S 

57 
TCC. 



f 

63 



77 



1 


e 


29 


30 


CTC 


CAG 


Ava r 


Xho I 




Q 


39 


40 


CCC 


CAT 


k 




49 




A.\A 




y 


g 


58 


59 


TAC 


CGT 


1 k 




1 69 




1 AAA 




1 C 


1 


78 


79 1 


|tcc 


GGTj 



178 



208 



g 

60 
GCT 



235 



^68 



295 



325 




289 



Table 39: vgDKA to vary set 1(2 B?TI 2,2 











g 


P 


c 


X 


a 


D 1 












35 


36 


37 


38 


39 


40 




5' • 


- ca 




CCTC 


GCG 


ccc 


TCC 


nrA 


GCG 


GAT| 


208 




t soacer 


Aoa T 












+ 




+ 


+ 


















Q 
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A method of obtaining a protein that, binds a 
predetennined target that comprises: 

a) preparing a variegated population of replicable 
genetic packages, each package including.a nucleic 
acid construct coding on expression for an outer- 
surface-displayed potential binding protein 
comprising (i) a structural _ signal directing the 
display of the protein on the outer surface of the 
package and (ii) a potential binding domain for 
binding said target. where a plurality of 
different potential binding domains are displayed 
by said population, 

b) causing the ex-pressicn of said pro-eir.s and the 
display of said, proteins on the outer surface of 

such packages, 

c) contacting the packaaes •-•ith target =aterial so 
Chat the potential binding donains of the proteins 
and the target material say interact, and 
separating packages bearing a binding d=r„ain that 
binds target caterial Croa packages that do not so 
bind, and 

d, recovering and replicating at least one package 
bearing a successful binding donain. 

The nethod of claim 1 wherein the population of 



replicable genetic packages 
obtained by: 



of step (a) is 
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a) preparing a variegated population of replicable 
genetic packages, each package including. a nucleic 
acid construct coding on expression for an outer- 
surface-displayed potential binding protein 
comprising (i) a structural signal directing the 
display of the protein on the outer surface of the 
package and (ii) a potential binding domain for 
binding said target, where a plurality of 
different potential binding domains are displayed 
by said population, 

b) causing the expression of said proteins and the 
display of said proteins on the outer surface of 
such packages, 

c) contacting the packages vith target material so 
that the potential binding domains of the proteins 
and the target material n:ay interact, and 
separating packages bearing a bindir.g domain that 
binds target material froa packages that do not so 
bind, and 

d, recovering and replicating at least one package 
be*\rinq a successful binding domain. 



2. The method of claim 1 wherein the population of 



replicable genetic packages 
obtained by: 



variegated population 
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certain predetermined degree of aCfinity for 
target. loaterial, and the required degree of 
affinity is increased for each new variegated 
population. 

The method of claim 1 wherein the displayable 
potential binding protein is a chimeric protein. 

The method of claim 7 wherein said signal is 
provided by a segment of said chimeric protein 
which is essentially identical in amino acid 
sequence with at least a functional portion of a 
natural outer surface protein encoded by said 
genetic package or a cell naturally infected by 
said genetic package, said portion directing the 
transport of said chimeric protein to the outer 
surface of the genetic package. 

The method cf claim 2 wherein the second sequence 
is obtained by operably linking a DNA sequence 
encoding a potential outer surface transport 
signal to a DtiA sequence expressing a protein that 
confers a selectable phenotype to obtain a test 
construct, 'introducing the test constructs into 
suitable hosts, causing expression of said DNA 
construct, selecting genetic packages that display 
the protein that confers the selectable phenotype 
on their outer surface, and choosing as said 
second sequence the D!JA sequence encoding the 
potential outer surface transport signal of one of 
such selected genetic packages; wherein the 
potential outer surface transport signals encoded 
by the individual test constructs are non- 
identical. 
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certain predetermined deqree of affinity for 
target, aaterial, and the required degree of 
affinity is increased for each new variegated 
population. 

The method of clain I wherein the* displayable 
potential binding protein is a chimeric protein. 

The method of claim 7 wherein sa'id signal is 
provided by a segment of said chimeric protein 
which is essentially identical in amino acid 
sequence with at least a functional portion of a 
natural outer surface protein encoded by said 
genetic package or a cell naturally infected by 
said genetic package, said portion directing the 
transport of said chin:eric protein to the outer 
surface of the genetic package. 

The nethod of claim 2 wherein the second sequence 
is obtained by operably linking a DNA sequence 
encoding a potential outer surface transport 
signal to a DMA sequence expressing a protein that 
confers a selectable phenotype to obtain a test 
construct, ' introducing the test constructs into 
suitable hosts, causing expression of said DMA 
construct, selecting genetic packages that display 
the protein that confers the selectable phenotype 
on their outer surface, and choosing as said 
second sequence the QUA sequence encoding the 
potential outer surface transport signal of one of 
such selected genetic packages; wherein the 
potential outer surface transport signals encoded 
by the individual test constructs are non- 
identical. 
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17. The method of claim 3 in which the binding domain 
of the knovn protein has a known seq\jence of amino 
acids, and the identity and spatial relationship 
of the amino ..acids . forming a surface of said 
domain is known. 
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18. The method of claim 3, said target material 
comprising one or more discrete molecules, said 
parental potential binding domain being 
characterized as a sequence of amino acids, 
further comprising identifying an interaction set 

.of amino acids which are on the surface of the 
parental potential binding domain and which can 
all simultaneously touch a single molecule of the 
target material, and obtaining potential binding 
domains by substituting a different amino acid for 
one or more of the amino acids in said interaction 
set. 

19. The method of claim 3 wherein the level of 
variegation of the population is chosen such that 
the packages displaying potential binding domains 
obtained by single amino acid substitutions in the 

• amino acid sequence of the parental potential 
binding dcr.ain are present in detectable amounts. 




20. The method of clain 3 wherein the amino acid 
substitutions to be made are chosen after 
consideration of the 3D structure of the parental 

30 potential binding domain. 

21. The method of clai.-a 15 wherein the amino acid 
substitutions to be node are for amino acids of 
the chosen domain of the known protein which are 

35 known to be alterable without reducing the melting 
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17. The method of clain 3 in which the binding dosiain 
of the known protein has a knovn sequence of amino 
acids, and the identity and spatial relationship 
of the anino acids foming a surface of said 

5 domain is knovn. 

18. The nethod of clain 3, said target naterial 
comprising one or aore discrete nolecules. said 
parental potential binding domain being 

10 characterized as a sequence of amino acids, 

further comprising identifying an interaction set 
.of anino acids which are on the surface of the 
parental potential binding domain and which can 
all sinultaneously touch a single nolecuie of the 

IS target naterial, and obtaining potential binding 

domains by substituting a different aniino acid for 
one or nore of the anino acids in said interaction 
set. 

20 19. The nethod of clain 3 wherein the level of 
variegation of the population is chosen such that 
the- packages displaying potential binding detains 
" obtained by single anino acid substitutions in the 
■ anino acid sequence of the parental potential 
25 ^ binding dc-ain are present in detectable ariounts. 

20. The method of clain 3 wherein the anino acid 
substitutions to be made are chosen aftor 
consideration of the 3D structure of the parental 

30 potential binding domain. 
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the chosen domain of the knovn protein which are 

35 known to be alterable without reducing the melting 




BEST AVAILABLE COPY 




n 



C 



17^ 



344 

affinity separated and retain viability. 

'30, The method of claim 3 in which the initially 
chosen parental potential binding protein has at 
5 least one stable binding domain and said" donain 

has a melting point of at least 60°C and is stable 
over a pM range of at least 3.0-B.O. 

31. The method of claim 15 vherein the known binding 
10 protein is an enzyme, the activity of which has a 

deleterious effect on the replicable genetic 
package, the host of the replicable ger.et.ic 
package, or the target, wherein the majority of 
the nucleic acid constructs code on expression or 
15 an analogue of the known binding protein that does 

'not have such enzymatic activity. 

32. The method of claim 1 wherein the targec contains 
ionizable groups and the pH of the solutions of 

20 the intended use and the pH of the affinity 

separations are chosen so that both the potential 
binding protein and the target remain stable. 

33. The method of claim 1 wherein the target contains 
25 - ionizable groups, further comprising providing 

counter ions in affinity separations and the 
solutions of the intended use to reduce 
electrostatic repulsion between the potential 
binding protein and the target. 

30 

34. The method of claim 1 wherein the initial 
potential binding domain is picked so that, under 
the conditions of intended use of the desired 
binding protein and under the conditions of 

35 affinity separation, that the potential binding 
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thereof embodying an outer surface transport 
signal: 

The method of clain 42 wherein the signal is 
provided, by the gene III protein . of M13 or a 
segment thereof enibodying an outer surface 
transport signal. 



46. The method of clain 3 wherein the initially chosen 
10 parental potential binding doaain is at least 50% 

homologous with the binding docain of bovine 
pancreatic tri'psin inhibitor, having the residues 
C5, C12, C30, F33, G37, C51 and C55. 
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20 



The method of claim 46 further specifying that: a) 
residue 21 contains one of the amino acids Y, F, 
w, or I; b) residue 23 contains one of the amino 
acids Y or F; c) residue 35 contains, one of the 
residues Y, F, or W; d) resi-iue 40 contains one of 
tl:e amino acids G or A; e) residue 45 contains 
either F or Y. 



25 



48. The method of clain 47 wherein the residues to be 
varied are chosen frora aniong residues 17, 19, 21, 
27, 28, 29, 31, 32. 34, 48, 49, and 52. 



30 



49. The miithod of clai^ 43 wr.erein the additional 
residues 9, 11, 15, 16, 13, 20, 22, 24, 26, 35, 
47, and 53 are allowed to vary. 

50. The method of claia 47 wherein the residues to 
vary are picked frcn one of the interaction sets 
identified in table 34. 



35 51. The method of claim 2 wherein the distribution of 
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thereof eiabodying an outer surface transport 
signal; 

45, The method of clain 42 wherein .the signal is 
5 provided by the gene III protein, of M1.3 or a 

segment thereof eabodying an outer surface 
transport signal, 

46. The method of claia 3 wherein the initially chosen 
10 parental potential binding domain is at least 50% 

homologous with the binding docain of bovine 
pancreatic tri'psin inhibitor, having the residues 
C5, C12, C30, F33, C37, C51 and C55. 

15 47. The method of claim 46 further specifying that: a) 
residue 21 contains ere of the aaino acids Y, 
w, or' I; b) residue 23 contav-is one of the amino 
acids Y or F; c) residue 35 contains, one of the 
residues Y, F, or W;d) residue 40 contains one of 

20 tl:e amino acids G or A; ej residue 45 contains 

either F or Y. 

48. The method of clain 47 wherein the residues to be 
varied are chosen fron aniong residues 17, 19, 21, 
25 27, 28, 29, 31, 32, 34, 48, 49, and 52. 
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49. The miithod of clain ''3 vnerein the additional 
residuf.s 9, 11; 15, 16, 13 , 20, 22, 24, 26, 35, 
47, and 53 are alio-ed to vary. 

50. The method of clain 47 wherein the residues to 
vary are picked from one of the interaction sets 
identified in table 34. 



35 51. The method of claim 2 wherein the distribution of 
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insensitive to UV, tolerant of desiccation, and 
resistant to a pK of 2.0 to 10.0. 

The method of claim 1 wherein the genetic packages 
may be frozen and later revived. 

The method of claim 1 wherein the genetic package 
is a cell with a doubling time of 20-40 minutes. 

The method of claim 1 wherein the genetic pacV.age 
is a virus with a burst size of at least 
lOO/infectcd cell. 

The method of claim 1 wherein the genetic packages 
are harvested by centr i f ugation without loss of 
viability. 

The method of claim 3 wherein the initially chosen 
parental potential binding domain is selected frr;m 
the group consisting of (a) binding domains o: 
bovine pancreatic trypsin inhibitor, crambin, 
ovomucoid, T4 lysozyrae, hen egg white lysozy=e, 
ribonuclease, and azurin, and (b) domains at least 
50\ homologous with any of the foregoing domains 
and which have a melting point of at least 60°C. 




63. 



64 . 
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The method of claim 36 wherein the outer surface 
transport signal is provided by the lamB protein 
or a segment thereof embodying an outer surface 
transport signal. 

The method of claim 38 wherein the outer surface 
transport signal is provided by the cotA, cotB, 
cote or cotD protein or a segment thereof 
embodying an outer surface transport signal. 
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insensitive to UV, tolerant of desiccation, and 
resistant to a pH of 2.0 to 10.0. 

58. The method of claim 1 wherein the genetic packages 
5 may be frozen and later revived. 

59. The method of claim 1 wherein the genetic package 
is a cell with a doubling time of 20-40 minutes. 

.10 60. The method of claim 1 wherein the genetic pacXage 
is a virus -ith a burst size of at least 
lOO/infectcd cell. 

61. The method of claim 1 wherein the genetic packages 
15 are harvested by centr i f ugation without loss of 

viabi i ity . 

62. The method of claim 3 wherein the initially chosen 
parental pccertial binding domain is selected ttr^n 

20 the group consisting of (a) binding domains of 

bovine pancreatic trypsin inhibitor, cracbin, 
ovomucoid, T4 lysozyme, hen egg white lysozyne, 
ribonucleose , and azurin, and (b) domains at least 
S0\ homologous with any of the foregoing domains 

25 and which have a melting point of at least 60°C. 
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63. The method of claim 36 wherein the outer surface 
transport signal is provided by the lam3 protein 
or a segment thereof embodying an outer surface 
transport signal- 

64. The method of claim 38 wherein the outer surface 
transport signal is provided by the cotA, cotB, 
cote or cotD protein or a segment thereof 
embodying an outer surface transport signal. 
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is further chosen to yield the largest value for 
the quantity (( 1 . -abundance (stop codons) ) tiroes 
(abundance of the least abundant amino 
■ acid) / (abundance of the most abundant amino 
acid) ) . 

The protein of claim 66, wherein the protein 
comprises a first foreign domain recognizing a 
first target material and a second" foreign domain 
recognizing a second target material.- 
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